C’AR-TR-S93 


X0r)n]A9A]A)52] 
Augusi  199^ 


Extracting  Structure  from  Optical  Flow 
Using  the  Fast  Error  Search  Technique 


.  Sridliar  Srinivasaii 

(ViUci'  loi'  Auionialioij  l\<‘S('arcli 
I  nivcrsily  of  Maryland 
C'olleoo  Park,  MI)  20742-327') 


COMPUtER  VISION  LABORATORY 


CENTER  FOR  AUTOMATlbN  RES^RCH 


UNiVERSITY  OF  MARYLAND 


COLLEGE  PARK,  MARYLAND 
20742-3275 


CAR-TR-893 

CS-TR-3923 


N00014-95-1-0521 
August  1998 


Extracting  Structure  from  Optical  Flow 
Using  the  Fast  Error  Search  Technique 

Sridhar  Srinivasan 


Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742-3275 


Abstract 

In  this  paper,  we  present  a  robust  and  computationally  efficient  technique  for  esti¬ 
mating  the  focus  of  expansion  (FOE)  of  an  optical  flow  field,  using  fast  partial  search. 
For  each  candidate  location  on  a  discrete  sampling  of  the  image  area,  we  generate  a 
linear  system  of  equations  for  determining  the  remaining  unknowns,  viz.  rotation  and 
inverse  depth.  We  compute  the  least  squares  error  of  the  system  without  actually  solving 
the  equations,  to  generate  an  error  surface  that  describes  the  goodness  of  fit  across  the 
hypotheses.  Using  Fourier  techniques,  we  prove  that  given  an  N  x  N  flow  field,  the 
FOE  can  be  estimated  in  0{N‘^  log  N)  operations.  Since  the  resulting  system  is  linear, 
bounded  perturbations  in  the  data  lead  to  bounded  errors. 

We  support  the  theoretical  development  and  proof  of  our  algorithm  with  experiments 
on  synthetic  and  real  data.  Through  a  series  of  experiments  on  synthetic  data,  we  prove 
the  correctness,  robustness  and  operating  envelope  of  our  algorithm.  We  demonstrate  the 
utility  of  our  technique  by  applying  it  to  the  problem  areas  of  3D  stabilization,  moving 
object  detection,  rangefinding,  obstacle  detection,  and  generation  of  3D  models  from 
video. 
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1  Introduction 


The  extraction  of  the  three-dimensional  structure  of  a  moving  scene  from  a  sequence  of 
images  is  termed  the  structure  from  motion  (SFM)  problem.  The  solution  to  this  problem 
is  a  key  step  in  the  computer  vision  tasks  of  robotic  navigation,  obstacle  avoidance,  time 
to  collision,  recognition  of  solid  objects,  and  surveillance.  Recent  interest  in  this  area 
has  been  sparked  by  the  desire  to  build  3D  models  from  video  for  virtual  reality,  video 
conferencing,  manufacturing  and  medical  applications.  The  proliferation  of  powerful 
computing  machinery  and  video  streams  has  transitioned  the  problem  from  a  purely 
theoretical  domain  to  one  of  practical  interest.  Mathematical  analysis  of  SFM  proves  the 
nonlinear  interdependence  of  structure  and  motion  given  observations  on  the  image  plane. 
While  this  problem  has  received  considerable  attention  by  researchers,  the  proposed 
solutions  tend  to  have  several  shortcomings.  Yet,  it  is  well  recognized  that  the  human 
visual  system  performs  the  task  of  simultaneously  estimating  motion  and  the  qualitative 
depth  structure  of  a  moving  scene  quite  efficiently  and  accurately. 

Psychophysical  experiments  on  human  subjects  reveal  that  the  first  stage  of  processing 
in  the  human  visual  system  is  the  estimation  of  the  motion  field.  Methods  for  estimating 
the  structure  from  the  motion  field  are  referred  to  as  differential  techniques,  as  opposed 
to  discrete  methods  which  rely  on  two  or  more  distinct  views  of  the  scene.  The  optical 
flow,  which  is  defined  as  the  2-D  projection  of  the  3D  motion  field,  is  composed  of  the 
horizontal  and  vertical  velocity  fields,  u(x,  y)  and  v(x,  y).  These  are  related  to  the  motion 
and  scene  depth  according  to 

u{x,y)  =  {-U  +  xt,s)g{x,y)  +  xyu)j:-{l  +  x^)LL)y  +  yu}2 

y)  =  i-ty  +  ytz)g{x,  y)  -h  (l  +  -  xyuy  -  (1) 

where  the  translational  and  rotational  motion  vectors  are  {tx,ty,tz)  and  {ujx^ujy^uiz)  re¬ 
spectively.  g{x,y)  =  llZ{x,y)  is  the  inverse  scene  depth,  and  all  linear  dimensions  are 

normalized  in  terms  of  the  focal  length  /  of  the  camera.  Such  a  pair  of  equations  ex¬ 

ists  for  every  point  (x,y)  on  the  image  plane  at  which  optical  flow  can  be  determined. 
Assuming  there  exist  M  such  points,  there  are  2M  equations  and  Af  -)-  5  independently 
determinable  unknowns.  Of  the  latter,  M  unknowns  are  related  to  scene  depth  and  the 
remaining  five  are  the  determinable  motion  parameters.  One  obvious  means  of  simplify¬ 
ing  the  system  is  to  eliminate  the  depth  map  y(x,  y)  from  the  above  equations.  Such  an 
operation  gives  rise  to  a  nonlinear  system  of  equations  in  five  determinable  unknowns. 
In  practice,  however,  the  elimination  of  depth  is  very  sensitive  to  noise  in  the  data  u,  v, 
which  is  compounded  by  the  nonlinear  nature  of  the  resulting  equations. 

The  pioneering  psychophysical  experiments  by  Wallach  and  O’Connell  [1]  and  Gibson 
[2,  3]  hypothesize  the  simultaneous  recoverability  of  structure  and  depth  from  an  optical 
flow  field.  This  is  quantified  and  substantiated  by  Ullman  [4],  Nakayama  and  Loomis 
[6],  and  Koenderink  and  van  Doom  [5].  The  early  approaches  by  Longuet-Higgins  and 
Prazdny  [7],  and  later  by  Waxman  and  Ullman  [8]  derive  closed-form  solutions  to  the  SFM 
problem  by  examining  derivatives  of  the  optical  flow  field,  with  an  underlying  assumption 
of  smoothness  of  the  3D  surface.  Bruss  and  Horn  [9]  and  Adiv  [10]  propose  nonlinear 
solutions  which  minimize  the  squared  error  between  the  observed  and  predicted  flows. 
While  both  the  approaches  are  iterative,  the  former  is  global  while  the  latter  subdivides 
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the  flow  field  into  smooth  patches  with  a  mechanism  that  allows  them  to  be  combined. 
Linear  approaches  to  SFM  can  be  traced  back  to  the  two- view  solution  formulated  by  Tsai 
and  Huang  [11]  based  on  point  correspondences.  Linear  methods  are  applied  to  motion 
fields  by  Zhuang  et  al.  [12,  13],  Waxman  et  al,  [14],  and  Mitiche  et  al.  [15].  Prazdny  [16] 
subdivides  the  SFM  problem  by  first  hypothesizing  the  rotational  component  of  motion, 
and  then  solving  for  the  remaining  variables  by  means  of  a  nonlinear  system  of  equations. 
In  a  related  manner,  Lawton  [17]  searches  for  the  translation  direction  at  discrete  points 
on  the  surface  of  a  unit  sphere  he  terms  the  translation  direction  sphere.  A  good  summary 
of  past  literature  and  algorithms  is  provided  in  the  books  by  Mitiche  [18]  and  Weng  et 
al.  [19]. 

Among  the  more  recent  SFM  literature  are  the  solutions  due  to  Jepson  and  Heeger 
[20]  and  Gupta  and  Kanal  [21],  both  of  which  eliminate  depth  and  rotation  variables  from 
(1)  to  solve  for  translation  using  a  linear  constraint.  Direct  techniques,  first  proposed 
by  Aloimonos  and  Brown  [22],  and  later  expanded  by  Negahdaripour  and  Horn  [23]  and 
Horn  and  Welden  [24],  altogether  bypass  computation  of  the  motion  field  by  operating 
on  raw  luminance  data.  Several  additional  cues  have  been  used  to  formulate  a  solution  to 
the  SFM  problem,  prominent  among  which  are  stereo  images,  stereo  flow  fields,  and  long 
sequences.  These  are  not  pertinent  to  the  current  work  and  are  not  discussed  further. 
One  important  constraint  that  has  gone  largely  unnoticed  (expect  as  a  means  of  resolving 
ambiguities  in  the  form  of  multiple  solutions)  is  the  non-negativity  of  depth.  This  has 
been  used  in  the  recent  approaches  in  a  direct  manner  by  Fermuller  and  Aloimonos 
[25,  26],  and  more  indirectly  by  Fejes  and  Davis  [27]. 

The  stability  of  algorithms  that  eliminate  the  depth  field  g{x,y)  from  (1)  [11,  12,  13, 
14,  15,  21]  is  questionable,  although  Gupta  and  Kanal  [21]  have  claimed  experimental  re¬ 
sults  demonstrating  some  amount  of  noise  resilience.  Moreover,  these  algorithms  discard 
valuable  information  in  the  form  of  nonlinear  equality  constraints  which  are  not  simple 
to  enforce.  Likewise,  assuming  smoothness  of  the  depth  field  or  optical  flow  field  is  not 
always  valid,  more  so  when  there  is  noise  in  the  flow  estimates.  Thus,  differentiating 
noisy  flow  fields  [7,  8]  in  order  to  solve  SFM  is  highly  undesirable.  On  the  other  hand, 
the  method  due  to  Fejes  and  Davis  [27]  requires  strong  variations  in  the  underlying  depth 
map  (and  thereby  in  the  optical  flow  field)  to  work  at  all!  Nonlinear  optimization-based 
solutions  [9,  10,  23]  are  relatively  stable  in  the  presence  of  noise.  However,  minimiz¬ 
ing  a  nonlinear  cost  function  exposes  the  solution  to  the  pitfalls  of  local  minima  and 
slow  convergence.  The  drawback  of  search-based  methods  [16,  17]  is  their  slow  speed  of 
execution. 

In  this  paper,  we  present  a  fast  partial  search  technique  for  locating  the  focus  of 
expansion  (FOE)  from  a  motion  field.  The  focus  of  expansion  is  hypothesized  to  lie  within 
a  bounded  square  on  the  image  plane.  For  each  candidate  location  on  a  discrete  sampling 
of  the  plane,  we  generate  a  linear  system  of  equations  for  estimating  the  remaining 
unknowns  which  are  the  rotational  velocities  and  inverse  depth  field.  We  compute  the 
least  squares  error  of  the  system  without  actually  solving  the  equations,  to  generate  an 
error  surface  that  describes  the  goodness  of  fit  as  a  function  of  the  hypothesized  focus 
of  expansion.  Our  technique  exploits  the  symmetry  of  (1)  and  uses  Fourier  techniques 
to  vastly  reduce  the  computational  burden  of  this  method  based  on  partial  search.  The 
TniniTTinm  of  the  error  surface  occurs  at  a  discrete  location  very  close  to  the  true  FOE.  For 
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an  image  of  size  N  xN  in  which  the  discrete  sampling  is  at  integer  pixel  locations  and  the 
FOE  is  assumed  to  occur  within  the  image  area,  the  order  of  complexity  in  computing 
the  error  surface  is  only  ^(iV^logiV).  Our  method  does  not  try  to  eliminate  the  depth 
field  from  (1).  Since  the  resulting  system  is  linear,  bounded  perturbations  in  the  optical 
flow  estimates  lead  to  a  deterministic,  bounded  offset  of  the  error  surface  minimum  from 
zero.  Noise  resilience  is  inherent  in  this  linear  formulation. 

A  significant  deviation  of  our  work  from  conventional  techniques  is  that  the  error 
surface  we  generate  gives  us  a  distributed  representation  of  the  confidence,  quantified  in 
terms  of  the  mean  squared  error  over  the  solution  space.  In  a  similar  manner,  Simon- 
celli  argues  for  a  distributed  representation  of  the  optical  flow  field  of  a  sequence  as  an 
alternative  to  a  unique  solution  [28].  He  shows  that  the  computed  error  of  a  candidate 
solution  is  related  to  its  likelihood  of  being  the  true  solution.  In  a  Bayesian  framework, 
a  distributed  representation  allows  for  error  covariance  propagation.  Likewise,  we  claim 
that  the  error  surface  embeds  information  regarding  both  the  presence  of  the  true  FOE 
and  a  confidence  measure.  A  relatively  fiat  error  surface  indicates  low  confidence  in  the 
solution  whereas  a  sharp  dip  indicates  high  confidence. 

In  order  to  demonstrate  the  superiority  of  our  approach  over  existing  techniques,  we 
use  four  criteria,  viz.  correctness,  robustness,  operating  range  and  utility.  At  a  minimum, 
any  valid  solution  to  a  problem  must  be  correct,  given  a  perfect  input  data  set.  This 
is  our  first  criterion,  which  we  prove  first  in  theory  and  then  demonstrate  on  artificially 
generated  perfect  flow  fields.  Next,  we  show  the  robustness  of  our  method  by  experiments 
on  noisy  and  sparse  flows.  In  order  to  traverse  the  operating  range  of  the  algorithm, 
we  evaluate  its  performance  over  a  range  of  fields  of  view  and  depth  map  structures. 
Finally,  we  consider  several  real-world  applications  calling  for  3D  SFM  to  demonstrate 
the  utility  of  our  approach.  We  use  real  imagery  in  this  final  set  of  experiments  covering 
3D  stabilization,  rangefinding,  independent  motion  localization  and  obstacle  detection. 

This  paper  is  organized  as  follows:  Section  2  introduces  the  concept  of  partial  search 
and  the  assumptions  of  our  approach.  The  theory  and  derivations  of  our  technique 
are  explained  in  Section  3,  with  some  of  the  proofs  given  in  the  Appendix.  Section  4 
extends  our  technique  by  relaxing  certain  assumptions  made  earlier.  The  experiments 
on  synthetic  and  real  data  are  detailed  in  Section  5.  Finally,  we  conclude  with  a  few 
remarks  in  Section  6. 

2  Problem  Formulation 

In  the  remainder  of  this  paper,  we  will  cissume  a  camera- centered  coordinate  system  whose 
origin  coincides  with  the  center  of  the  imaging  lens  and  whose  XT-plane  is  parallel  to  the 
image  plane.  We  will  also  assume  that  the  focal  length  of  the  lens  is  known  accurately, 
and  that  the  linear  distance  unit  is  normalized  by  the  focal  length,  i.e.  /  =  1.  The  Z 
axis  lies  along  the  optical  axis  and  the  image  plane  lies  on  Z  =  —1,  which  is  the  focal 
plane  of  the  camera.  Equation  (1)  can  be  rewritten  as 

u(x,  y)  =  -{x  -  Xf)h{x,  y)  -h  xyuj^  -  (1  +  +  yoJz 

y)  =  -{y  -  yj)h{^,  y)  +  (l  +  y'^)0Jx  -  xyuy  -  xuJz  (2) 
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where  {xf,yf)  is  known  as  the  focus  of  expansion  (FOE)  and  h{x,y)  is  the 

scaled  inverse  depth  map  of  the  scene  being  imaged,  h{x,y)  = 

3D  motion  algorithms  aim  at  computing  the  3D  motion  parameters  of  the  camera, 
viz.  t  =  and  a?  =  It  can  be  seen  that  t  can  only  be  recovered  up 

to  a  scale  factor,  i.e.  if  (t  =  is  a  solution,  so  is  [mto,u)o).  The  3D  structure 

of  the  scene  h{x,  y)  can  be  recovered,  once  the  3D  motion  parameters  are  known.  Many 
techniques  build  on  the  first  step  in  which  h{x,  y)  is  eliminated  from  (2),  but  this  makes 
the  solution  very  sensitive  to  noise  in  flow  estimates.  In  order  to  achieve  noise  resilience 
and  yet  keep  the  computational  demands  of  the  solution  within  reasonable  bounds,  we 
wish  to  devise  a  linear  solution  to  the  SFM  problem.  The  technique  we  propose  is  based 
on  fitting  a  linear  model  to  each  candidate  hypothesis  over  a  bounded  search  space. 

2.1  Partial  Search 

Exhaustive  search  has  seldom  been  accepted  by  theoreticians  as  the  best  technique  for 
arriving  at  the  solution  of  a  set  of  equations,  due  to  its  perceived  inelegance  and  lack 
of  mathematical  structure.  However,  in  practice  exhaustive  search  is  used  when  other 
methods  fail,  and  when  it  can  be  performed  in  a  reasonable  amount  of  time.  Suppose 
the  nonlinear  set  of  equations  for  which  a  solution  is  desired  is  given  by 

f(x)  =  0,  >M  (3) 

Exhaustive  search  for  a  solution  Xe  involves  {i)  enumerating  a  finite  set  of  candidate 
solutions  X  =  {xo,Xi,...}  that  adequately  cover  the  solution  space,  (u’)  computing  an 
error  metric  {e.g.  ||f(xi)|p)  which  associates  each  candidate  solution  x;  with  a  compliance 
measure,  and  (in)  locating  the  minimum  error  and  corresponding  candidate  solution, 
which  for  the  squared  error  metric  is 


Xe  =  argmn{||f(x)||2}  (4) 

In  general,  the  order  of  complexity  of  exhaustive  search  is  proportional  to  the  cardinality 
\X\  of  X.  This  can  get  unmanageably  large  as  the  dimensionality  M  of  x  increases. 

An  intermediate  approach  to  searching  for  all  the  components  of  the  solution  is  to 
enumerate  only  a  few  components  and  solve  for  the  remaining  components  based  on  the 
hypothesis.  For  example  in  (3),  assume  that  the  argument  x  can  be  partitioned  as 

a  €  +  M2  =  M  (5) 

and  given  a^,  (3)  can  be  solved  for  the  remaining  components  bj  with  a  small  number  of 
computations.  We  will  call  a  the  search  component  and  b  the  dependent  component  of  x. 
Partial  search  of  a  solution  Xp  is  performed  by  (i)  enumerating  a  finite  set  of  candidate 
partial  solutions  A  €.  {ao,ai,...}  that  adequately  cover  the  search  component  space, 
(m')  computing  the  dependent  component  bj  corresponding  to  each  a,  €  A  that  satisfies 
(or  closely  satisfies)  (3),  (Hi)  computing  an  error  metric,  for  example  |lf([a(|b(]')|p  for 
the  squared  error  case,  and  (iv)  picking  the  candidate  solution  corresponding  with  the 
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■minimum  error.  Step  (n)  can  be  defined  as  a  minimization  over  the  continuous  space  of 
permissible  b  of  the  same  squared  error  metric  to  formulate  the  partial  search  solution 
Xp  as 


Xp  = 


b*  = 

Partial  search  separates  the  problem  into  a  search  and  a  minimization.  In  general,  the 
complexity  of  the  original  problem  is  proportional  to  the  cardinality  \A\  of  A,  and  to  the 
number  of  operations  required  to  compute  b,-  given  a,-.  It  may  be  possible,  depending  on 
the  problem,  to  simplify  certain  steps  in  the  partial  search  procedure,  leading  to  further 
reduction  in  complexity.  This  paper  approaches  the  SFM  problem  through  partial  search 
and  exploits  the  symmetry  of  the  problem  to  minimize  the  computational  requirement. 

When  the  optical  flow  u,  v  is  known  at  M  points  {(xq,  t/o),  (a:i,  yi), . . .  (xm-i,  Vm-i)}, 
we  get  M  instances  of  (2)  with  2M  linearly  independent  equations^  and  M  +  5  unknowns 
in  all.  An  exhaustive  search  for  a  solution  must  consider  the  combinatorial  multiplicity 
of  the  M  +  5  unknowns.  This  gets  unwieldy  even  when  M  =  5,  which  is  the  minirrinm 
number  of  equation  sets  needed  for  a  unique  solution.  For  the  sake  of  argument,  assuming 
that  there  are  L  possible  discretized  permissible  values  for  each  of  the  scalar  variables, 
exhaustive  search  computation  requires  0(11'^'*’®)  operations.  A  crude  search  with  5% 
uncertainty  in  each  variable  (i. e.  L  =  10)  and  with  M  =  5  observed  points  gives  rise  to 
10^*^  combinations,  which  escalates  to  over  10^®  when  the  number  of  observations  doubles 
to  10.  Exhaustive  search  is  clearly  not  an  alternative,  and  indeed  we  require  that  the 
number  of  search  combinations  be  independent  of  the  number  of  observations  since  the 
latter  can  potentially  be  huge. 

In  order  to  restrict  the  search  to  a  finite  number  of  dimensions,  the  search  component 
a  must  not  include  the  depth  variables  h{xi,yi).  Of  the  remaining  five  variables,  the 
rotational  components  are  linearly  related  to  the  observation,  which  in  this  case  is  the 
flow  field.  Thus,  in  addition  to  the  scaled  inverse  depth  map,  rotation  forms  an  ideal 
candidate  for  including  in  the  dependent  component.  The  search  component  comprises 
Xf  and  yf  which  are  the  focus  of  expansion,  and  the  dependent  component  includes  the 
rotation  and  depth  variables  which  number  M  +  3  in  all.  It  can  be  seen  in  (2)  that  the 
dependent  component  and  observation  are  linearly  related  when  the  search  component 
or  FOE  is  fixed.  Given  a  search  component  a,-  =  {xf-,yf^y,  the  dependent  component 
bi  can  be  solved  for  by  means  of  a  (M  +  3)  x  (M  +  3)  matrix  inversion  which  takes 
at  most  0{M^)  steps.  The  worst-case  computational  requirement  of  this  partial  search 
approach  is  which  is  combinatorially  smaller  than  Yet  even  this 

number  is  too  large  for  practical  use.  The  remainder  of  this  section  analyzes  the  linear 

^Except  for  the  singular  case  where  there  is  no  rotation  and  one  of  the  M  points  coincides  with  the 
FOE. 


a’ 

b* 


argminmin  f 
&€A  b 


(:) 


argmm  f  ^ 


(6) 
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system  obtained  when  choosing  a  candidate  search  component,  with  the  eventual  aim  of 
reducing  the  order  of  complexity  of  partial  search. 

2.2  Assumptions 

We  will  begin  the  analysis  by  enumerating  the  assumptions  which  simplify  the  under¬ 
standing  of  our  approach.  As  we  progress,  we  will  loosen  some  of  the  assumptions  until 
only  the  essential  ones  remain.  Further,  we  will  argue  that  the  assumptions  are  realistic 
and  commonly  hold  in  practice.  In  the  following  paragraphs  describing  the  assumptions, 
the  first  sentence  is  the  initial  assumption  made,  for  ease  of  understanding.  The  sec¬ 
ond  sentence,  which  is  italicized^  is  the  minimum  necessary  to  prove  our  method.  The 
remainder  of  the  item  is  a  discussion  of  the  assumption. 

•  The  input  flow  field  is  defined  over  a  square  region,  of  size  N  x  N,  where  N  —  2^ . 
Can  he  fully  relaxed.  With  a  square  field  whose  side  is  a  power  of  2,  we  can  use 
the  fast  Fourier  transform  (FFT)  in  some  of  our  later  computations.  Other  sizes 
do  not  change  the  formulation,  but  might  lead  to  certain  bounded  inefficiencies  in 
the  computational  requirement. 

•  The  input  flow  field  is  dense.  Can  be  fully  relaxed.  We  first  prove  our  result  by 
assuming  that  a  dense  flow  field  is  available  for  the  N  xN  image.  Later,  we  discuss 
the  handling  of  sparse  flow  fields.  However,  the  complexity  of  the  proposed  method 
is  related  to  the  size  of  the  image  for  which  the  flow  is  available  and  not  to  the 
number  of  equation  sets  [i.e.  the  number  of  points  at  which  the  optical  flow  is 
known).  Thus,  the  number  of  computations  required  to  solve  the  problem  given  a 
sparse  flow  field  is  equal  to  the  corresponding  count  given  a  dense  flow  field. 

•  The  FOE  lies  within  the  area  defined  by  the  input  flow  field.  The  FOE  lies  unthin  a 
bounded  region.  The  primary  aim  of  this  effort  is  directed  toward  situations  where 
there  is  a  large  forward  translational  motion,  like  vehicle  navigation.  In  this  case, 
the  FOE  almost  always  lies  within  the  image.  In  extreme  situations  where  the  FOE 
is  very  large  or  at  infinity,  t^  is  very  small  in  comparison  with  one  or  both  of  t^  and 
ty.  Equation  (1)  can  be  approximated  as 

u{x,  y)  w  -txg{x,  y)  -f  xyuj^o  -  (1  +  x^)ojy  -1-  yu^ 

v{x,  y)  «  -tyg{x,  t/)  -t-  (1  -I-  y^)u>:,  -  xyujy  -  (7) 

For  this  system  of  equations,  there  are  only  M  +  4:  independently  determinable 
variables  since  there  is  still  a  scale  ambiguity  in  the  system.  However,  that  is  not 
all.  Vision  systems  in  practice  tend  to  have  small  fields  of  view,  or  equivalently, 
large  focal  lengths.  In  this  case,  x,y  <  1  and  the  contribution  of  the  second  terms 
in  (7)  is  small,  leading  to  a  first-order  approximation 

«(x,  y)  w  -tx9{x,  y)-ujy  +  yu^ 

v(x,  y)  K  -tyg{x,  y)  -f-  -  xu^  (8) 

The  instability  of  system  (7)  is  clear  when  one  notes  the  occurrence  of  multiple 

solutions  in  its  approximation  (8).  Assume  that  {ta;,ty,(jL!x,u)y,u}z,go,gi, . . .  gu-i} 
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is  a  valid  solution  for  the  M  instances  of  (8).  Then  the  sets  +  tyS.Uy  - 

txS,u}z,go  +  +  ^5  •  •  •5'M-i  +  are  also  valid  solutions  for  all  values  of  ^  >  0. 

The  positivity  constraint  on  8  ensures  that  depths  are  non-negative.  An  analysis  of 
the  instability  of  SFM  under  various  conditions  is  provided  in  Adiv  [29]  and  Young 
and  Chellappa  [30]. 

Secondly,  when  performing  3D  SFM  over  a  long  sequence  of  frames,  the  location 
of  the  FOE  at  the  previous  time  instant  is  known  after  the  first  frame  for  which 
the  motion  field  is  recovered.  If  it  can  be  assumed  that  the  FOE  does  not  change 
drastically  between  frames,  the  search  window  for  the  FOE  at  the  current  frame 
can  be  shifted  and  re-centered  around  the  estimated  FOE  at  the  previous  frame. 
Under  the  relaxed  assumption  that  the  FOE  must  lie  within  a  bounded  area,  the 
re-centering  strategy  will  work. 

Finally,  it  must  be  borne  in  mind  that  the  output  of  our  algorithm  is  not  a  single 
solution;  rather,  it  is  an  error  metric  for  the  entire  search  space.  When  the  FOE 
lies  outside  the  search  space,  we  expect  that  the  shape  of  the  error  surface  will  show 
a  dip  in  the  direction  of  the  true  FOE.  In  our  experiments,  we  have  observed  that 
when  the  true  FOE  lies  outside  the  search  area,  the  minimum  is  attained  at  its 
periphery,  indicating  the  likelihood  of  the  true  minimum  lying  beyond.  This  flags 
an  error  condition  and  provides  a  candidate  offset  by  which  to  shift  the  search  area. 

In  summary,  the  only  non-trivial  situation  where  we  expect  the  proposed  method  to 
fail  is  when  the  following  three  conditions  simultaneously  hold:  (i)  wide  field  of  view, 
(it)  very  small  forward  translation,  and  (in)  no  available  estimate  of  the  FOE  from  the 
prior  frame  that  can  form  a  reasonable  guess  to  recenter  the  search  area.  Even  in  this 
situation,  if  the  minimum  error  is  attained  at  the  search  boundary  (which  is  very  likely 
to  be  the  case),  we  can  iteratively  re-center  and  locate  the  error  surface  minimum.  In 
the  next  section,  we  develop  the  theory  behind  our  approach. 

3  Approach 

Let  the  true  FOE  be  (x/,j//).  Assuming  that  the  flow  field  is  of  size  N  x  N  and  all 
flow  estimates  are  available,  the  optical  flow  at  pixel  location  ij€  {0, 1, . . . ,  W  -  1}^  is 
given  by 


"b  ~  (1  +  xf  j)u}y  -f-  yijUJz 

=  —{yi,j  -  yf)hi,j  -h  (1  -f-  ylj)(^x  -  XijyijUy  —  (9) 

where  x.j  =  and  =  {h,u,v}(xi,j,yij).  The  trans¬ 

formation  between  the  pixel  coordinate  sy:;tem  point  (i,j)  and  the  normalized  3D  coor¬ 
dinate  system  point  (x,  y)  on  the  image  p,:  ne  is  reversible  and  is  given  by 


(10) 
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Thus,  the  optical  center  lies  at  in  the  pixel  coordinate  system.  WLOG,  we 

will  switch  between  coordinate  systems  to  simplify  notation  wherever  necessary.  Define 

=  ( 1  +  ylj  ) 


Q 


'0,0 

^0,0 

^0,1 

^0,1 


h 

u 


^iv-i,Ar-i 

—  ^  hofi  ho,i  .  •  •  ^jv-i,Ar-i  ) 

(  «0,0  Vo,0  U0,1  Vo,l  .  .  .  U/^-i,JV-l 


xo,o  —  0  0 

2/0,0  -  2//  0  0 

0  xo,i-^f  0 

0  yo,i  -  2//  0 

0  0  Xo,2  —  Xf 

0  0  yo,2  -  Vf 


vn-i,n-i 


0 


0 


A(x/,y/) 

Xo 


h  ■ 

u: 


xn-i,n-i  —  Xf 
yN-i,N-i  —  Vf  . 


(11) 


We  will  drop  the  axgument  {xf,yf)  of  P(x/,  j//)  and  A(x/,y/)  where  it  is  obvious.  The 
above  definitions  allow  us  to  consolidate  the  motion  equations  for  all  individual  flow 


vectors  in  the  brief  form 


Q  1 


h 

UJ 


=  U 


(12) 


i.e. 

A(x/,y/)  Xo  =  u.  (13) 

Replacing  the  unknowns  Xf,yf  and  Xq  by  the  hypothesized  variables  Xh,yh  and  x  we  get 
the  general  condition 

A{xh,yh)  y- ^  (14) 

where  the  true  solution  exactly  satisfies  (14). 

We  now  define  a  squared  error  cost  function  C{xh,  yh','^)  as 


C{xk,yh,^)  =  \\A{xh,yh)  ^\\l 
8 


(15) 


Since 


C{xh,yh-,'x)>^  (16) 

C[xj,yf,-Ko)  =  {)  (17) 

{i)  the  true  solution  to  the  system  (14)  minimizes  the  cost  function  C{  ),  and  (u‘)  all 
minimizers  of  C'( )  satisfy  (14)  exactly.  Thus,  we  have  reduced  the  original  problem  to 


(18) 

which  can  be  decomposed  as 

minminC(xfc,t/;„x)  (19) 

The  inner  minimization  occurs  at  the  least  squares  (LS)  solution  Xls  of  A{xh^yh)  x  — *■ 
u.  Later,  we  will  prove  uniqueness  of  xi,5.  Indeed,  C{  )  can  have  more  than  one 
minimizer.  In  the  separable  form  (19),  existence  of  multiple  minima  is  indicated  by 
existence  of  multiple  values  of  Xh,yh  (and  thereby  of  x)  attaining  the  minimum  for  C{ ). 
In  this  situation,  exhaustive  search  over  the  entire  search  space  will  pick  out  all  the  valid 
solutions.  However,  it  is  not  difficult  to  see  that  even  with  coarse  discretization,  the 
number  of  free  variables  is  too  large  to  permit  exhaustive  search.  Referring  to  our  earlier 
discussion  of  search,  we  see  that  partial  search  is  an  ideal  technique  for  solving  (19). 

In  order  to  perform  partial  search,  we  set  {xh,yh}  to  be  the  search  component  and  x 
to  be  the  dependent  component.  We  discretize  the  search  component  space  at  midway 
locations  between  four  pixels^  over  the  entire  image  area,  in  line  with  our  assumption 
that  the  FOE  lies  within  the  image.  Later,  we  will  relax  the  search  component  space  to 
be  any  uniformly  discretized  rectangular  set  of  points,  not  necessarily  within  the  image. 
We  do  not  dispute  that  one  could  potentially  construct  a  pathological  counter-example 
that  lets  the  minima  “slip  through”  the  lattice  formed  by  discrete  values.  But  it  is  our 
belief  that  such  situations  do  not  occur  in  practice.  We  will  experimentally  justify  the 
discretization  process. 

The  LS  solution  x^s  of  A{xh,yh)  x  — +  u  satisfies 


A'Axls 


P'P  P'Q 
Q'P  Q'Q 


Xz,5 


A'u 

r  p' 


(20) 

(21) 


where  the  arguments  of  A  and  P  have  been  dropped.  D  *=  P'P  is  a  diagonal  matrix, 
given  by 

D  =  Diag{x?-fy?} 
where  Xj  and  y,-  are  functions  of  (xk^yh)- 


Xi  ,imod?V 

Vi  y[tyiVJ,imod/V’  Vh 

^Hypothesizing  the  FOE  to  lie  at  pixel  locations  leads  to  a  singularity. 
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We  shall  assume  here  that  Xi  and  yi  are  never  zero.  This  can  be  achieved  by  ensuring 
that  the  discretization  grid  for  FOE  hypotheses  is  not  coincident  with  the  pixel  sampling 
grid.  Even  this  restriction  can  be  overcome,  but  we  shall  deal  with  it  later,  assuming  for 
now  that  D  is  never  singular.  One  way  of  guaranteeing  this  is  to  pick  the  hypothesized 
FOE  to  lie  midway  between  pixels.  When  D  is  nonsingular,  xz,s  is  unique.  Applying 
appropriate  pre- multiplying  matrices,  we  can  manipulate  the  system  as  shown: 


D  P'Q 
Q'P  Q'Q 
I  D-ip'Q 
Q'P  Q'Q 
D-ip'Q 


X  = 


X  = 


0  Q'Q  -  Q'PD-ip'Q  J  ^  [  Q'(I  -  PD-ip') 

Introducing  matrices  M  €  and  M  €  defined  as 

M  =  (I-PD-^P') 

=  BlockDiagI  ^  a  I 

M  =  Q'MQ 


u 


P' 

Q' 

D-ip' 

Q' 

D-ip' 


u 


u 


we  get 


I  D-ip'Q 
0  M 

I  D'P'Q 
0  I 


X  = 


X  = 


u 


^  X  = 


D-ip' 

Q'M 

D-ip' 

M-iQ'M 

D-ip'(I-QM-^Q'M) 

M-iQ'M 


u 


u 


The  error  Ax  —  u  is 

Ax  —  u  =  [  P  Q  ] 


D-ip'(I-QM-iQ'M) 

M-iQ'M 


u  —  u 


giving  the  squared  error 


I  Ax  —  u| 


(mqm  ^Q'M-m)u 

=  u' (^MQM'^Q'M-M^  u 


=  u'M^u  +  u'MQM  ^Q'M^QM  'Q'Mu 


-  2u'M*qM  'q'Mu 


(24) 


(25) 

(26) 


(27) 


(28) 


(29) 
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Again,  note  that  A,  M  and  M  are  functions  of  {xh,yh)-  It  can  be  easily  verified  that  M 
is  idempotent,  i.e.  =  M.  (29)  sinaplifies  to 

II  Ax  -  u IP  =  u'Mu  -  u'MQM"^Q'Mu  (30) 

The  most  naive  strategy  for  computing  the  least  squared  error,  or  equivalently,  of 
performing  the  inner  minimization  in  (19),  is  to  explicitly  solve  the  linear  system  for  the 
unknown  x  and  use  this  estimate  to  evaluate  the  squared  error.  W^ithout  assuming  any 
sparseness  or  symmetry  in  the  coefficient  matrix  A,  the  system  can  be  solved  by  matrix 
inversion.  Keeping  in  mind  that  x  €  3?^'+^  the  order  of  complexity  of  estimating  x  in 
this  manner  is  0{N^)^.  Exploiting  the  structure  of  A  leads  to  dramatic  improvements. 
From  (25)  and  (26),  it  can  be  seen  that  computing  matrices  M  and  M  requires  0{N^) 
operations,  which  is  the  complexity  of  estimating  x  and  the  squared  error  as  well.  Taking 
into  account  the  outer  minimization  search  leads  to  an  overall  complexity  of  0{N^)  for 
the  matrix  inversion  method  and  Cl(iV'*)  by  exploiting  symmetry.  In  addition,  examining 
(30)  reveals  that  the  only  data-dependent  term  is  u.  Thus,  given  sufficient  memory, 
the  data-independent  terms  can  be  pre-computed  and  stored,  for  each  value  of  {xh,yh)- 
However,  even  with  this  strategy,  the  overall  complexity  cannot  be  brought  down  below 
0{N^). 

In  what  forms  the  core  of  this  effort,  we  will  show  that  the  structure  in  A  can  be 
further  exploited  so  that  the  errors  can  be  computed  directly^  without  computing  the 
solution  X  explicitly.  Moreover,  the  least  squared  errors  for  all  the  candidate  hypotheses 
can  be  computed  in  a  single  step,  which  leads  to  an  overall  complexity  of  0{N'^  log  N). 
At  first,  this  seems  ridiculous  since  factoring  out  from  the  complexity  introduced  by 
the  outer  search  leaves  O(logA')  which  is  insufficient  even  for  vector  addition.  Yet,  it  is 
the  simultaneous  estimation  of  all  errors  in  the  search  space  that  allows  such  a  low  overall 
complexity.  We  introduce  the  notion  of  Fast  Computability  in  the  following  section. 


3.1  Fast  Computability 

A  few  preliminary  definitions  and  theorems  are  necessciry  before  we  proceed  with  the 

proof  of  our  technique.  Proofs  of  the  Fast  Computability  theorems  are  given  in  the 
Appendix. 


Definition  1  Let  S  G  The  cyclic  shift  Sfiojio]  ofS  by  (io,io)  is  defined  by 


S[*0;yo]i,j  —  S 


(t+io )  modA^,(  j+jo  )modAr 


(31) 


Definition  2 
defined  by 


The  lexical  ordering  of  a  matrix  S  e  denoted  by  S  e  is 


0 

S  [j'/JVJ  ,imodiV 


i  =  j 


(32) 


£  is  the  lexical  ordering  operator.  The  inverse  operation  is  defined  only  on  diagonal 
matrices: 


^More  accurately,  [31]1 


(33) 
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We  denote  the  space  of  such  diagonal  matrices  by  where  the  argument  3^  denotes 

the  space  of  each  element  Sj  of  the  matrix  S. 

Definition  3  The  cyclic  shift  of  a  lexically  ordered  matrix  S  €  by  (ioiio);  is 

5[io,jol  =  C  (£-'(5li„,j„l))  =  £(S|i„,jol)  (34) 


Definition  4  The  quantity  q(io,jo)  is  said  to  be  Fast  Computable  (FC)  if  q{io,jo)  can 
be  evaluated  \/ioJo  €  {0, 1, . . . ,  iV  -  1}  in  0{N^  logiV)  computational  steps. 

Theorem  1  Let  a,b  £  and  S  €  The  quantity  ?(eo,io)  €  3?  defined  by 

9(*o,io)  =  a'5[io,  jo]^>  is  FC. 


Proof:  Appendix  A.l. 

We  now  extend  Theorem  1  to  a  more  complicated  situation  where  each  scalar  element 
of  the  above  data  structures,  including  matrices  and  vectors,  is  replaced  by  a  doublet. 
The  doublet  corresponding  to  a  scalar  matrix  entry  is  defined  as  a  2  x  2  submatrix  and 
a  doublet  of  a  vector  component  as  a  2-vector.  In  other  words,  each  scalar  element  in 
the  matrices  is  replaced  by  a  2  x  2  real  matrix,  and  each  scalar  element  in  the  vectors 
by  a  2  X  1  real  vector.  The  concepts  of  cyclic  shift,  lexical  ordering  and  inverse  lexical 
ordering  are  redefined  below. 


Definition  5  Let 


1  c 

SiJuJ 

0^2X2 

So,o 

So,i 

So,N-l 

Si,o 

Si,i 

Stv-1,0 

Sjv-1,1 

S)v-i,jv-i  . 

So,Ooo 

So.ool 

^o.Iqo 

So,N- 

loi 

So,Oio 

So.oii 

So, Ho  •  • 

So,N- 

111 

^i»ooo 

Sl,Ooi 

Si, loo 

Si,jv- 

loi 

Stv— l,0n 

Sn-1,1io 

Sjv-i,Ar-iii 

The  cyclic  shift  S[zo,io]  o/S  by  {io,jo)  is  defined  by 


i.e.  S[io,yo] 


S 


(t+to  )modAr,(j+jfo)modAr 


s 


to  laodNJo  modiV 


s 


{N—1 +to )  modiV  Jo  mod  AT 


s 


to  modN,  (TV'— l+io )  modiV 


S  (TV — l+to  )modTV,  (TV — 1  +  jo )  modTV 


(36) 
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Definition  6  The  lexical  ordering  of  the  doublet  matrix  S  shown  in  (35),  denoted  bv 
is  defined  bg 


S  \i/ N } fimodN'^ 

=  BlockDiagll  SLi/Arj.imodJVoi  \  | 

1.  \  ®  [t/iVJ  itmod/VjQ  S|^jy;Vj^jjjjod/Vjj  J  j 

In  keeping  with  Definition  2,  we  will  denote  the  space  of  permissible  S  matrices  which 
constitute  the  lexical  ordering  of  a  doublet  matrix  by 

Definition  7  The  cyclic  shift  of  a  lexically  ordered  matrix  S  G  is  defined  by 

Theorem  2  Let  S  e  £^'(^><2)^  ^nd  a,  b  €  The  quantity  q{io,jo)  =  a'S[io,jo]b  is 


Proof:  Appendix  A. 2. 

3.2  Non-cyclic  Shift 

The  last  step  in  setting  the  stage  to  prove  fast  computability  of  the  squared  error  is 
extending  Theorem  2  to  non-cyclic  shifts.  When  a  space-limited  data  sequence  is  shifted 
across  a  viewing  window,  points  that  were  undefined  earlier  appear  within  the  window. 
In  a  cyclic  shift,  the  data  points  of  the  original  sequence  that  disappear  from  the  viewing 
area  are  wrapped  around  to  fill  the  locations  within  the  newly  visible  area.  However, 
when  the  shift  is  not  cyclic,  it  is  necessary  to  define  its  behavior,  especially  with  regard 
to  how  emerging  areas  are  filled  in. 

Definition  8  The  non-cyclic ^hift  S(*05jo)  of  matrix  S  G  by  {io,jo)  is  defined  in 

terms  of  the  superset  matrix  S  G 

S{ioJo)i,j  =  Si+io,i+io,  Vz'cio  G  {0, 1, . . . ,iV  -  1}.  (38) 

Definition  9  The  non-cyclic  shift  of  a  lexically  ordered  matrix  S  G  is  defined 

by 

^[>'0,jo\=  T{S{ioJo))  (39) 

Theorem  3  Let  a,b  G  and  S  G  The  quantity  q{ioJo)  €  3?  defined  by 

9(*o,Jo)  =  a'S{iojQ)b  is  FC. 

Proof:  Appendix  A.3. 

We  now  extend  Theorem  3  to  the  doublet  space. 

Theorem  4  Let  a,b  e  and  S  G  £^'(3J2x2),  quantity  q{io,jo)  €  ^  defined  by 
9(^0, io)  =  ci'S{io,jQ)b  is  FC. 


Proof:  Appendix  A.4. 

Finally,  we  extend  the  above  theorem  to  arbitrary  finite- dimensional  aggregations  in 
the  following  results. 

Corollary  1  Let  S  €  a  €  3^^^'  and  B  €  for  some  arbitrary  integer 

p>l.  The  quantity  q{io,jo)  =  a'S{io,jo)B  is  FC. 

Proof:  Writing  out  q{io',jo)  in  terms  of  its  components, 

9(*o,io)  =  (  jo]&o  •••  a'S[io,jo]bp-i  )  (40) 

where  B  =  {Bq  bj  ...  6p_i).  Each  of  the  components  of  q{io,yo)  is  FC.  When  p  is  a 
constant,  the  overall  computational  steps  are  still  0(iV’^  log  TV). 

Corollary  2  Likewise,  let  A,B  ^  p  being  an  arbitrary  constant.  Q{io,jo)  = 

A'S[io,jo]B  is  FC. 

Proof:  As  in  Corollary  1. 


3.3  Basic  Proof 


Next,  we  show  how  the  squared  error  given  by  equation  (30)  is  Fast  Computable,  for  every 
choice  of  (a;/,y/).  This  enables  the  likely  solution  space  to  be  searched  exhaustively  in 
0{N^  log  N)  steps.  First,  we  use  (10)  to  define  the  pixel  coordinate  (ih,jh)  corresponding 
to  the  hypothesized  FOE  {xh,  yh)  as 


I  fyk  +  ^ 


(41) 


We  define  the  search  space  for  {ihijh)  fo  be  ,  W  —  |  j  .  The  candidate  solutions 

lie  at  half-pixel  displacements  along  a  regular  grid  covering  the  image  area.  In  the 
discussion  ahead,  we  will  interchangably  use  the  pixel  and  XY  coordinate  systems. 


Lemma  1  M  € 


Proof:  From  (25)  and  Definition  6. 


Definition  10  The  superset  matrix  M  G  is  a  doublet  matrix  defined  by 


= 


(j-N+lfi 

-{i-N+l){j-N+l) 


.{i-N  +  l){j-N+l) 

(i-N  +  lY 


(i-N  +  iy  +  U-N  +  iy 


(42) 


Vz,ie{0,l,...,2iV-l}. 
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In  the  above  definition,  the  |  terms  are  included  to  stagger  the  grid.  There  is  a  one-to-one 
relation  between  the  hypothesized  location  (ihdh)  and  the  index  (ij)  according  to 

{kjh}  ^  {i,j} +  ^  (43) 

The  importance  of  staggering  the  sampling  grid  for  the  search  component  is  seen  from 
(42),  where  the  |  terms  prevent  the  denominator  of  the  leading  fraction  from  vanishing. 
However,  it  must  be  noted  that  this  fix  is  merely  cosmetic  since  even  without  staggering, 
the  entry  Miv,jv  at  the  singular  point  can  be  evaluated  using  limits. 

Lemma  2  M(x;i,  yh)  =  M(iV  —  1  —  [i/ij ,  AT  —  1  —  [j;ij)  to  within  half-pixel  discretization. 


Proof:  From  (23),  (25)  and  (10),  we  have 


£-^(M(x,,y,)ki 

{.Vi, 3  ~~  Vh)  ~(^ij  ~  ^h){yi,j  ~  Uh) 

_ -  xh){yi,j  -  Vh)  {xi,j-xhf 

(®jj  ~  -)-  (t/jj  —  J//i)^ 

■  ^  -  M)(i  -  V-  -  fvh) _ (^'  -  ^  -  fxhY _ [ 

(i  -  ^  -  fx,y  +  {j-^-  fynY  ^  ^ 

Comparing  (42)  with  (44),  we  see  that  £~^(M)  is  a  windowed  version  of  the  superset 
matrix  M.  The  location  of  this  window  (^,  1)  is  computed  by  equating  the  indices.  In 
general,  since  the  computed  location  may  not  be  integral,  rounding  is  performed.  The 
following  equation  evaluates  k.  I  is  evaluated  likewise. 


N-1 

2 


Thus,  upon  discretization,  we  get 


C  ^(M(a:;i,y/i))ij  =  M,+(;v-i)-L4J,j+(iV-i)-LjhJ  (46) 

which  gives  the  non-cyclic  shift  relationship  in  mixed  coordinate  system  notation 

M(a;;i,  yh)  =  M(iV  -  1  -  [4J ,  iV  -  1  -  [jh\ )  (47) 

This  result  is  key  to  tying  in  fast  computation  with  the  SFM  problem.  Lemma  2  maps  the 
FOE  search  space  [-f,  f  to  the  discrete  non-cyclic  shifts  to,  io  G  {iV-1,  iV-2, . . . ,  1, 0} 
of  M(to,io). 
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Lemma  3  M  is  FC. 


Proof: 

M  =  Q'M(a;^,j//)Q  =  Q'M(io,io)Q  (48) 

which  is  FC,  from  Corollary  2  and  Lemma  2,  since  Q  € 

Lemma  4  M”^  is  FC. 

Proof:  From  Lemma  3,  M  can  be  evaluated  over  the  search  space  in  0{N^  log  N) 
operations.  A  3  x  3  matrix  can  be  inverted  in  a  constant  number  p  of  operations.  Inverting 
M  over  all  the  hypotheses  takes  N'^p  operations.  The  overall  complexity  is  still 
0{N^  log  N)  and  hence  is  FC. 

Lemma  5  u'Mu  and  Q'Mu  are  FC. 

Proof:  From  Corollary  1  and  Corollary  2. 

Theorem  5  The  squared  error  (30 j  is  FC. 

Proof:  Evaluating  (30)  in  the  manner  indicated  by  the  underbraces  in  the  equation 

||Ax-u||"  (49) 

groups  the  right-hand  side  as  a  sum  or  product  of  Fast  Computable  terms.  The  products 
are  among  3x3  matrices  and  3x1  vectors.  The  product  and  sum  operations  them¬ 
selves  require  0{N'^)  computations  to  evaluate  over  the  entire  search  space.  The  overall 
complexity  is  O(iV^logiV),  and  the  squared  error  is  FC. 

In  (49),  it  can  be  seen  that  the  terms  u'MQ  and  M  =  Q'Mu  axe  transposes  of 
each  other.  There  are,  in  all,  three  quantities  whose  evaluation  is  based  on  the  Fast 
Computation  theorems,  viz.  u'Mu,  u'MQ  and  Q'MQ.  Of  these,  the  last  term  is  data- 
independent.  In  our  basic  solution,  therefore,  M  and  its  inverse_can  be  computed  before¬ 
hand.  Likewise,  the  Fourier  transform  of  the  superset  matrix  M,  which  is  evaluated  for 
computing  M,  is  stored  in  memory  for  later  use  in  computing  u'Mu  and  u'MQ.  Next, 
we  list  the  steps  of  the  basic  algorithm. 

3.4  Algorithm 

•  Step  1:  Compute  the  superset  matrix  M  and  its  Fourier  transform. 

•  Step  2:  Compute  M(io,io)  =  Q'M(io,  jo)Q  and  its  inverse  for  all  locations  of  the 
candidate  solution. 

•  Step  3:  Using  the  given  dense  optical  flow  field,  compute  u'M(io,  io)u  and  u'M(zo5  io)Q- 

•  Step  4-'  Form  the  products  and  compute  the  error  for  each  candidate  hypothesis  of 
the  FOE. 


16 


•  Step  5:  Pick  the  location  of  the  FOE  corresponding  to  the  snaallest  squared  error. 

•  Step  6:  Repeat  from  step  3  for  the  next  data  set. 

The  basic  building  block  of  the  algorithm  is  a  function  that  performs  the  Fast  Compu¬ 
tation  a'M(io,io)&  for  the  arguments  a  and  h.  Since  this  is  central  to  the  approach,  we 
have  provided  a  detailed  algorithm  for  this  function,  which  we  term  FastCompute,  in 
Appendix  B. 

4  Relaxing  the  Assumptions 

How  does  the  proposed  solution  change  if  the  velocity  estimate  u,  v  is  not  available  for 
a  particular  point  {i,j)  ?  In  such  a  situation  the  corresponding  equation  pair  (9)  is  also 
not  available.  If,  with  no  increase  in  computational  complexity,  we  can  replace  (9)  with  a 
set  of  equations  that  retains  the  coefficients  of  j  and  yet  does  not  influence  the  solution 
in  any  manner,  we  can  extend  our  technique  to  both  sparse  flow  fields  as  well  as  non-2* 
image  sizes. 

Consider  the  equations 

0  ^ d”  “I”  Oujy  -(-  OcOz 

0  =  ~{yi,j  ~  yf)K,j  +  Ocjj;  -f  Oojj,  +  (50) 

The  consistent  solution  of  these  equations  is  hij  =  0,  with  u  being  indeterminate.  Since 
the  coefficients  of  w  are  zero,  appending  this  set  of  equations  to  our  system  does  not 
influence  the  solution.  Obviously,  the  depth  at  point  i,j  cannot  be  estimated.  Replacing 
(9)  by  (50)  for  every  pixel  at  which  the  optical  flow  is  not  known  preserves  the  structure 
of  P{xf^yf),  although  the  corresponding  rows  of  Q  must  be  set  to  zero.  With  this 
substitution,  the  reasoning  in  Section  3  holds  and  the  FOE  can  still  be  estimated  in 
0{N'^  log  N)  steps. 

The  above  substitution  allows  us  to  zero-pad  a  flow  field  if  needed  to  maJce  its  size  a 
power  of  2.  Moreover,  since  most  optical  flow  techniques  produce  sparse  flow  fields,  this 
substitution  allows  us  to  interface  with  these  methods.  There  is  no  need  to  interpolate 
sparse  flow  fields  —  an  operation  which  may  lead  to  reduced  accuracy.  Yet  another 
situation  calling  for  zero-padding  is  where  the  search  area  is  larger  than  the  image,  even 
when  the  latter  is  a  power  of  2  in  size. 

Indeed,  (50)  is  a  special  case  of  weighting  (9)  by  zero.  By  using  a  continuum  of 
weights,  reliability  measures  of  the  flow  field  can  be  incorporated  into  the  system.  The 
depth  map  must  be  suitably  rescaled  to  retain  the  coefficients  of  so  as  to  preserve 
the  structure  of  P(x/,y/)  and  thereby  validate  the  reasoning. 

If  it  is  known  by  some  means  that  the  FOE  is  present  in  an  area  around  iCQjgf,  ^off ’ 
our  method  can  be  made  more  effective  by  incorporating  this  knowledge  into  the  formu¬ 
lation.  Assume  that  the  offset  in  the  pixel  coordinate  system  is  corresponding 
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to  a^Q^jp,  y^Q.  The  superset  matrix  M  is  redefined  by 


M 


(!  -  toff  -  W  +  \){j  -JoS-N  +  i) 


(i  _ 


(i_io5-iv+i)2+a-ioff-iv+i)= 


(51) 


^  {0,1,..., 2Af  —  1}.  This  definition  offsets  the  search  window  by  *off’-^ofF' 
remainder  of  the  computation  process  remains  the  same.  However,  the  offset  must  be 
added  to  argmin^^.j,;,  || A(a;;i,  j/fe)x  —  u|p  when  estimating  the  FOE. 

Shifting  the  search  area  by  an  offset  allows  the  fusion  of  external  information  into 
our  algorithm.  For  example,  while  processing  a  sequence  of  images,  the  search  window 
can  be  re-centered  at  the  FOE  estimate  of  the  previous  frame.  If  it  is  assumed  that 
acceleration  is  small  from  frame  to  frame,  such  re-centering  improves  the  probability  of 
finding  the  true  FOE  within  the  search  area.  Alternatively,  if  a  kinematic  model  exists 
for  the  camera  platform,  the  velocities  can  be  predicted  from  the  past  behavior.  This 
provides  a  useful  starting  guess  for  the  FOE. 

Finally,  we  claim  without  proof  that  the  overall  complexity  of  estimating  the  FOE  hy¬ 
pothesized  to  lie  in  a  Mx  x  My  area  (potentially  offset  from  the  center),  given  a  flow  field 
of  size  Nx  X  Ny  (potentially  sparse),  is  0{M2N2\o%M2N2)-,  where  M2  = 
and  N2  =  Although  this  is  within  the  same  order  of  magnitude  for  im¬ 

ages  whose  aspect  ratio  is  bounded,  this  expression  is  derived  by  tightening  the  Fast 
Computability  proofs. 


5  Experiments  and  Results 

In  this  section,  we  describe  our  experiments  on  evaluating  the  partial  search  FOE  estima¬ 
tion  algorithm.  Our  experiments  comprise  two  phases,  viz.  a  first  phase  which  measures 
quantitative  performance  on  synthetic  data  generated  with  known  parameters,  followed 
by  a  second  phase  which  examines  qualitative  performance  on  real-world  imagery  with 
emphasis  on  useful  applications.  Using  synthetic  data  allows  us  to  accurately  character¬ 
ize  performance  over  a  range  of  situations.  In  the  first  phase,  the  optical  flow  field  is 
synthesized.  On  real  data,  the  flow  is  obtained  from  a  sequence  of  images  using  optical 
flow  techniques  described  later.  The  synthetic  data  experiments  demonstrate  the  cor¬ 
rectness,  operating  range  and  robustness  of  our  algorithm.  The  utility  of  the  algorithm 
in  solving  real-world  problems  is  shown  using  real  data. 

During  the  course  of  the  experiments,  the  proposed  algorithm  is  applied  to  the  flow 
field  and  the  location  corresponding  to  the  minimum  error  in  (30)  is  picked  as  the  FOE 
(x,  y)  (or  equivalently,  (i,i)  in  the  pixel  coordinate  system).  A  correction  of  |  pixel  is 
applied  to  each  direction,  to  undo  the  effect  of  staggering  (43).  Once  the  FOE  is  esti¬ 
mated,  the  angular  velocity  and  depth  map  can  be  obtained  by  solving  linear  equations. 
From  (27),  we  can  see  that  the  LS  estimate  for  uj  is 

w  =  M-^Q'Mu  (52) 

which  is  a  product  of  terms  that  have  already  been  calculated.  Therefore,  the  estimate 
of  rotation  is  available  with  no  extra  computation. 
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Set 

N 

/ 

U 

jf 

UJy 

d 

A 

256 

400 

102.0 

51.0 

-2.0 

5.0 

8.0 

1.5 

B 

256 

400 

127.5 

201.5 

5.0 

3.0 

4.0 

1.7 

Table  1:  Parameters  used  in  generating  synthetic  data  sets  A  and  B 


The  depth  map  takes  some  additional  effort  to  recover.  Using  (27),  we  get 

k  =  D-^P'(I  -  QM-iQ'M)u  (53) 

which  requires  the  computation  of  D“^P'Q  and  D“^P'u.  These  take  O(N^)  operations. 
It  must  be  noted  that  the  depth  map  is  very  sensitive  to  perturbations  in  the  optical  flow 
as  well  as  the  FOE  estimate,  especially  near  the  latter.  Although  depth  map  recovery  is 
not  the  main  emphasis  of  this  work,  we  will  give  relevant  experimental  results. 


5.1  Experiments  with  Synthetic  Data 

The  least  squared  error  as  a  function  of  the  hypothesized  FOE  is  denoted  by  E,  and  is 
referred  to  as  the  error  surface.  The  minimum  value  of  the  error  surface,  which  occurs 
(bj)j  is  denoted  by  Emin-  Incorporating  the  round-off  error  in  estimating  the  FOE, 
the  error  in  the  FOE  estimate,  e,  is  given  by 


e 


min 


i  —  io 
j  -  jo 


t  -  ^o 

j  -  ji 


Z  — 

j  -  jo 


I 

A 

j 


(54) 


where  {i,i}o  =  [{bi}/  ~  2 J  "*■  2  “  2I  r  2  ^®rms  compensate 

for  staggering  of  the  hypothesis  grid.  This  expression  provides  a  4-pixel  neighborhood 
associated  with  zero  error,  if  the  true  FOE  lies  off  the  |-pixel  offset  grid  lines  along  each 
axis.  When  the  true  FOE  is  exactly  on  a  grid  point,  the  neighborhood  shrinks  to  one 
pixel. 

In  many  favorable  situations,  our  algorithm  estimates  the  FOE  with  no  error.  We 
use  a  secondary  error  metric,  which  is  the  total  angular  error,  given  by 


Cw  =  ||w  -  a;|| 

to  characterize  and  differentiate  between  cases  where  e  is  not  usable. 


(55) 


5.1.1  Generation  of  Synthetic  Data 

Using  (9),  we  generate  optical  flow  fields  corresponding  to  chosen  inverse  depth  maps, 
FOEs  and  rotational  velocities.  The  depth  map,  in  the  real  world,  is  comprised  of  largely 
smooth  regions  bounded  by  sharp  discontinuities.  The  size  of  these  regions  varies  widely. 
In  order  to  closely  model  the  real  world,  the  depth  map  is  generated  using  a  fractal 
model.  This  is  motivated  by  the  use  of  fractal  models  for  luminance  images  which, 
like  depth  maps,  are  also  composed  of  largely  smooth  regions  and  discontinuities.  The 
Fourier  transform  method  is  used  to  generate  the  depth  map  fractal.  In  this  method,  the 
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Set 

i 

i 

e 

Uy 

Cjz 

€.{jj 

A 

101.5 

51.5 

0.00 

-1.985 

5.013 

-7.995 

0.020 

B 

127.5 

0.00 

3.000 

4.000 

0.000 

Table  2:  Performance  on  dense,  noiseless  flow  fields 

transform  magnitude  is  chosen  to  be  exponential,  of  the  form  (/^  -|-  The  phase 

is  random,  uniformly  distributed  in  [0,27r).  We  call  d  the  fractal  exponent.  Upon  inverse 
Fourier  transformation,  the  resulting  images  (both  real  and  imaginary  components  of  the 
transform)  are  fractals.  Choosing  smaller  fractal  exponents  gives  more  intricate  fractals. 
The  fractals  so  generated  are  used  as  synthetic  inverse  depth  maps. 

For  our  experiments,  we  chose  N  =  256.  We  generated  two  baseline  data  sets  A 
and  B  with  parameters  shown  in  Table  1.  The  angular  velocities  are  in  m  rad,  and  the 
focal  length  /  is  in  pixel  units.  The  FOE  is  shown  in  the  pixel  coordinate  system,  with 
the  origin  located  at  the  top  left  corner  of  the  image  frame.  The  i  axis  is  down  the 
rows  and  j  axis  across  the  columns.  The  FOE  in  Set  B  is  located  at  a  valid  grid  point 
while  the  FOE  in  Set  A  is  located  exactly  midway  between  four  grid  points.  Besides, 
Set  A  displays  a  relatively  large  rotation  along  the  optical  axis.  On  the  other  hand, 
the  dominant  rotation,  for  Set  B,  is  along  the  image  plane  axes.  In  addition,  the  FOE 
coordinates  are  well  distributed  in  [0,  N].  The  focal  length  of  400  pixels  reflects  a  normal 
field  of  view  of  35  degrees. 

Fig.  1  depicts  Set  A:  the  underlying  fractal  depth  map  is  shown  as  a  grayscale  image  in 
(a),  the  rotational  component  of  the  flow  field  in  (b),  the  translational  component  alone 
in  (c),  and  the  overall  synthesized  flow  in  (d).  Close  objects  are  bright  while  objects 
at  infinity  are  black  in  Fig.  1(a).  The  confounding  of  rotational  and  translational  flow 
is  clear  from  (b),(c)  and  (d).  In  particular,  there  are  areas  in  Fig.  1(d)  where  the  flow 
field  alternates  direction.  Such  inflections  occur  at  points  where  neither  the  rotation  nor 
translation  is  dominant.  Set  B  is  similarly  depicted  in  Figs.  2(a)-(d). 

5.1.2  Sensitivity  Analysis  on  Synthetic  Data 

Next,  we  look  at  the  performance  of  our  algorithm  on  the  synthetic  data  sets.  Sets  A 
and  B  will  serve  as  the  benchmark.  For  analyzing  the  sensitivity  of  our  algorithm  to  a 
particular  parameter  p,  we  generate  an  ensemble  of  flow  fields  using  different  values  of  p, 
keeping  the  remaining  parameters  fixed  to  those  of  A  and  B.  To  begin  with,  we  assume 
that  the  focal  length  is  known  accurately.  In  Section  5.1.7,  we  examine  the  effect  of  errors 
in  the  focal  length  estimate. 

5.1.3  Ideal  Case 

With  a  perfect  flow  field  and  accurate  focal  length  estimate,  our  algorithm  locates  the 
FOE  without  error.  The  rotational  error  e^  is  zero  for  Set  B  which  lies  on  a  valid  grid 
point,  but  non-zero,  though  small,  for  Set  A,  which  lies  between  grid  points.  These  results 
are  shown  in  Table  2.  The  rotational  velocities  and  error  are  in  10“^  rad.  Contours  of 


20 


Noise 

Sparsity 

cr 

Tj  (approx.) 

100% 

80% 

60% 

40% 

20% 

0.0 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.1 

2.65 

0.00 

0.00 

0.00 

0.00 

0.00 

0.2 

5.29 

0.00 

0.00 

0.00 

0.00 

0.50 

0.4 

10.51 

1.00 

1.00 

1.21 

1.12 

0.50 

1.0 

25.08 

5.41 

9.80 

3.55 

6.77 

6.62 

Table  3:  FOE  estimation  error  e  as  a  function  of  the  noise  and  sparsity  of  the  input  flow 
field 

the  error  surface  E  are  plotted  in  Figs.  3(a)  and  (b)  for  sets  A  and  B  respectively.  The 
true  location  of  the  FOE  is  marked  by  a  +  and  the  estimated  location  by  a  x . 


5.1.4  Performance  with  Sparse/Noisy  Flows 


In  the  real  world,  optical  flow  is  seldom  determined  at  all  points  in  an  image  with  cer¬ 
tainty.  Besides,  the  flow  estimates  are  typically  noisy.  The  sparsity  of  flow  depends  on 
the  specific  optical  flow  algorithm  chosen,  the  presence  of  local  high-frequency  informa¬ 
tion,  and  the  existence  of  a  coherent  motion  across  the  image.  We  simulate  a  sparse 
flow  field  by  randomly  including  or  discarding  the  flow  at  a  given  pixel,  according  to  an 
i.i.d.  binary  distribution.  We  realize  noise  in  the  flow  field  by  adding  an  i.i.d.  zero- mean 
Gaussian  process  with  variance  cr  to  each  component  of  velocity.  The  noise  level  rj  is 
measured  according  to  the  angular  error  metric  employed  in  [32],  given  by 


rj  =  E 


cos 


-1  /  Vo  •  V 


,  |vo|ll|v 


(56) 


where  Vq  =  (u  u  1)',  v  =  Vq  -f  (t/u  rfy  0)'  and  rjuitfv  ~  cr).  t],  measured  here  in 
degrees,  is  insensitive  to  the  magnitude  of  the  motion  vector  and  offers  a  normalized 
measure  against  which  a  range  of  velocities  can  be  compared  meaningfully. 

We  simulated  flow  fields  corresponding  to  five  combinations  of  sparsity,  viz.  100%, 
80%,  60%,  40%  and  20%,  and  five  combinations  of  noise  level,  <t  =  0, 0.1, 0.2, 0.4  and  1.0. 
The  angular  error  r}  corresponding  to  a  is  approximately  the  same  for  a  given  a,  over 
sets  A  and  B  and  over  all  sparsity  levels.  The  error  e  in  estimating  the  FOE  using  our 
algorithm  is  tabulated  as  a  function  of  noise  and  sparsity  in  Table  3.  Fig.  4  represents 
the  worst  case  scenario.  The  flow  fields  generated  with  cr  =  1.0  are  shown  in  Fig.  4,  (a) 
Set  A  with  80%  density  and  (b)  Set  B  with  20%  density.  Figs.  4(c)  and  (d)  plot  the  error 
surface  contours  with  the  true  FOE  (-f)  and  estimate  (x)  corresponding  to  (a)  and  (b) 
respectively,  showing  good  compliance  of  our  solution. 


5.1.5  Efect  of  Depth  Structure 

Early  3D  SFM  techniques  that  assume  a  smooth  flow  field  tend  to  fail  when  there  are 
discontinuities  introduced  by  busy  depth  maps.  Likewise,  the  newer  techniques  that  ex¬ 
ploit  depth  non-negativity  minimally  require  busy  depth  maps  to  work  at  all.  One  of  the 
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Table  4:  Performance  as  a  function  of  depth  structure  quantified  by  the  fractal  exponent 
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Table  5:  Performance  as  a  function  of  field  of  view 


merits  of  our  approach  is  its  applicability  to  both  of  these  extremes  and  to  intermediate 
cases.  We  validate  this  claim  by  studying  performance  on  flow  fields  simulated  using  sev¬ 
eral  depth  maps  generated  by  a  range  of  fractal  exponents  d.  Moreover,  we  also  consider 
a  planar  depth  map  for  comparison,  given  by  ^planar(*’-?)  =  +  C'®*  +  Cyj-  An  imaged 

3D  planar  scene  gives  rise  to  this  form  of  depth  map. 

Retaining  the  FOE,  rotation  and  focal  lengths  of  sets  A  and  R,  we  generated  flow 
fields  for  the  planar  and  fractal  depth  maps  with  d  =  1.7, 1.5, 1.3  and  1.1.  The  high- 
frequency  content  of  the  depth  map  increases  with  decreasing  d.  The  results  of  FOE 
estimation  using  our  algorithm  are  shown  in  Table  4.  For  both  sets  A  and  5,  the  FOE  is 
estimated  with  zero  error  e.  The  angular  error  indicated  in  10“^  rad,  is  very  small  for 
A  and  zero  for  B.  This  is  not  surprising,  considering  that  the  true  FOE  for  A  lies  between 
grid  points.  Figs.  5(a)  and  (b)  show  the  extreme-case  depth  maps  which  are  planar  and 
fractal  respectively,  with  d  =  1.1.  The  corresponding  flows  generated  using  sets  A  and 
B  are  plotted  in  (c)  and  (d).  Finally,  the  error  surface  contours,  together  with  true  (-1-) 
and  hypothesized  (x)  FOE,  are  shown  in  Figs.  5(e)  and  (f)  respectively. 
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5.1.6  Performance  vs.  Field  of  View 

Until  now,  we  have  restricted  ourselves  to  flow  fields  generated  by  a  camera  with  a 
“normal”  field  of  view  (FOV),  which  is  typically  between  30  and  45  degrees.  The  nature 
of  the  flow  field  varies  considerably  as  the  focal  length,  or  equivalently,  the  FOV,  changes. 
A  robust  solution  to  the  3D  SFM  problem  must  operate  across  a  range  of  fields  of  view. 
With  noiseless  data,  our  algorithm  locates  the  FOE  without  error  for  /  between  200  and 
800  pixels,  corresponding  to  FOVs  between  18  and  65  degrees.  To  facilitate  a  better 
understanding  we  examine  the  performance  with  noise  in  the  flow  field.  We  generated 
flow  fields  for  sets  A  and  B,  with  noise  level  rj  set  to  10.0.  This  is  achieved  by  choosing 
appropriate  values  of  a. 

Table  5  provides  the  results  of  this  experiment.  The  FOE  is  estimated  with  reasonable 
accuracy,  and  the  rotational  error  (tabulated  in  10“®  rad)  is  very  small  for  all  cases.  As 
the  focal  length  increases,  so  does  the  a  needed  to  achieve  t)  =  10.0.  As  a  consequence, 
e  also  shows  an  upward  trend.  Fig.  6  shows  the  extreme  cases  of  our  experiment.  The 
synthesized  noisy  optical  flow  fields  corresponding  to  /  =  200, 800  and  800  pixels,  using 
parameter  sets  A,  A,  and  B,  are  shown  in  Figs.  6(a),  (c)  and  (e)  respectively.  The  error 
contours  together  with  true  (+)  and  estimated  (x)  FOE  are  shown  in  Figs.  6(b),  (d) 
and  (f).  Despite  obvious  large  perturbations  in  the  flow,  our  algorithm  performs  well  in 
locating  the  FOE. 

5.1.7  Effect  of  Mis-estimated  Focal  Length 

In  the  above  discussion,  we  have  assumed  that  the  focal  length  of  the  camera  is  known 
accurately.  This  is  realistic  in  the  real  world  as  the  physical  parameters  of  the  camera  are 
either  specified  or  can  be  measured  using  camera  calibration  algorithms.  Nevertheless, 
characterizing  the  sensitivity  of  our  algorithm  to  mis-estimated  focal  length  is  useful  since 
this  sensitivity  determines  the  deviation  of  our  FOE  estimates  when  the  focal  length  itself 
is  known  (or  estimated)  imprecisely.  Furthermore,  in  situations  where  the  focal  length  is 
altogether  unknown,  this  study  reveals  what  parameters,  at  a  minimum,  can  be  computed 
with  some  degree  of  reliability  using  an  arbitrarily  chosen  focal  length. 

Our  final  experiment  on  synthetic  data  sets  involves  estimating  the  FOE  from  noise¬ 
less,  dense  flow  fields  using  parameter  sets  A  and  B.  Unlike  the  previous  experiments,  the 
focal  length  assumed  in  the  computations  is  made  to  vary  over  the  range  200-800  pixels 
in  2°-^®  factor  multiples,  while  the  true  focal  length  is  fixed  at  400  pixels.  Table  6  provides 
a  summary  of  the  results  of  this  experiment.  Excluding  the  extreme  wide  angle  case  (200 
pixels),  the  FOE  estimate  is  very  good.  In  the  extreme  case,  the  FOE  is  displaced  by  15 
and  11  pixels  for  A  and  B  respectively.  These  results  lead  us  to  claim  that  the  algorithm 
is  relatively  insensitive  to  the  focal  length,  insofar  as  the  FOE  estimate  is  concerned.  For 
the  angular  velocities,  this  does  not  hold.  Ux  and  Uy  are  approximately  scaled  by  the 
ratio  of  the  true  focal  length  to  the  chosen  focal  length.  However  the  rotation  along  the 
optical  axis  is  relatively  robust  to  the  choice  of  focal  length. 

The  input  optical  flow  fields  for  this  experiment  axe  shown  in  Fig.  1(d)  and  Fig.  2(d). 
The  evolution  of  the  error  surface  contour  with  respect  to  focal  lengths  of  200,  283,  400, 
566  and  800  pixels  are  shown  in  Figs.  7(a)  and  (b).  Fig.  3(a),  Figs.  7(c)  and  (d)  for  the 
first  case,  and  in  Figs.  7(e)  and  (f).  Fig.  3(b),  Figs.  7(g)  and  (h)  for  the  second. 
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Table  6:  Performance  as  a  function  of  incorrectly  estimated  focal  length 
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Our  experiments  using  synthetic  flow  flelds  illustrate  the  strengths  and  limitations 
of  our  algorithm,  and  provide  useful  insight  regarding  its  domain  of  applicability.  In 
particular,  we  have  shown  that  our  approach  is  robust  to  all  the  commonly  encountered 
issues  in  flow  field  analysis,  primary  amongst  which  are  noise  and  sparsity.  It  remains  to 
be  seen  how  our  algorithm  can  be  applied  to  real-world  problems;  this  is  our  focus  in  the 
next  section. 

5.2  Experiments  on  Real  Data 

Although  our  technique  for  FOE  estimation  works  well  in  theory  and  in  simulations, 
applying  it  to  solving  real  world  problems  presents  a  whole  new  set  of  challenges.  In  the 
data  processing  chain,  our  algorithm  uses  a  precomputed  optical  flow  field  and  provides 
as  output  the  FOE,  rotational  velocity  and  depth  map  (up  to  a  scale  factor).  Thus,  its 
performance  is  to  some  extent  circumscribed  by  the  accuracy  of  the  flow  field  estimation 
pre-processing  stage.  While  the  FOE  and  rotational  velocity  estimates  are  reasonably 
robust  to  errors  in  the  flow  field,  the  same  is  not  true  of  the  depth  map.  This  is  inevitable, 
and  can  be  explained  by  the  redundancy.  Redundancy,  quantified  here  as  the  number  of 
equations  containing  the  relevant  term,  is  much  higher  for  the  FOE  and  rotation  com¬ 
pared  to  each  inverse  depth  estimate.  Moreover,  the  depth  map  is  particularly  unreliable 
near  the  FOE  since  even  small  errors  in  its  estimate  translate  into  large  relative  errors 
in  the  coefficient  corresponding  to  the  inverse  depth. 

Thus,  for  demonstrating  practical  applicability,  we  have  two  problems  at  hand,  viz. 
(1)  how  to  accurately  estimate  a  dense  optical  flow  field  and  (2)  how  to  effectively  use 
the  variably  unreliable  depth  map  that  our  algorithm  computes.  In  this  context,  it  must 
be  mentioned  that  our  algorithm  does  not  enforce  a  non-negativity  constraint  on  the 
computed  depth  map,  and  negative  estimates  are  very  likely  to  be  obtained.  In  one 
sense,  a  negative  estimate  for  depth  is  preferable  to  a  positive  estimate  with  large  error, 
since  the  former  is  obviously  invalid  while  the  latter  cannot  be  identified  as  such. 

5.2.1  Applications 

As  mentioned  in  the  introduction  to  this  paper,  3D  SFM  has  several  application  areas. 
We  have  considered  five  applications  which  are,  in  increasing  order  of  complexity,  3D 
stabilization,  rangefinding  or  depth  estimation,  independent  motion  detection,  obstacle 
detection,  and  3D  model  building  for  virtual  reality.  3D  stabilization  is  the  process 
in  which  the  jerky  3D  rotation  of  a  moving  camera  is  compensated  by  reversing  the 
rotation,  in  order  to  stabilize  the  image.  Depth  estimation  and  obstacle  detection  are 
self-explanatory.  Detection  of  independently  moving  foreground  objects  is  trivial  when 
the  camera  is  stationary.  However,  when  the  camera  is  itself  moving,  this  process  gets 
tricky.  One  cue  for  detecting  such  objects  is  to  check  for  consistency  of  3D  motion 
over  the  scene.  Likewise,  obstacle  detection  looks  for  areas  in  the  image  whose  irpth  is 
inconsistent  with  a  level  profile  of  the  ground.  3D  model  building  from  a  sequence  of 
images  is  the  ultimate  application  since  it  extracts  all  the  3D  information  in  a  sequence 
into  a  virtual  reality  model.  Re-creating  the  image  sequence  is  achieved  by  retracing 
the  3D  path  through  the  model  “world”.  Moreover,  alternate  paths  can  be  traversed, 
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generating  alternate  views  that  did  not  exist  in  the  original  sequence. 


5.2.2  Optical  Flow  Estimation 

Early  in  our  experiments,  we  noticed  that  the  standard  optical  flow  techniques  [32]  pro¬ 
vided  neither  the  accuracy  nor  the  density  needed  in  most  real  applications.  Despite  sev¬ 
eral  decades  of  work  on  differential  techniques  for  motion  estimation  (including  ours  [33]), 
the  standard  algorithms  were  unable  to  obtain  any  reasonable  flow  field  estimate  for  the 
real  video  sequences.  The  low  resolution  of  the  CCD  sensor,  temporal  aliasing  caused 
by  coarse  sampling  intervals,  and  unsteady  motion  of  the  camera  are  contributing  causes 
to  the  failure.  Barring  3D  stabilization  (where  depth  estimates  are  not  required),  all  the 
other  applications  use  an  exhaustive  search  based  image  matching  technique  developed 
by  us.  This  process  is  based  on  block  matching  with  a  small  refinement  to  give  subpixel 
estimates.  The  flow  field  used  for  3D  stabilization  is  generated  using  the  overlapped  basis 
optical  flow  field  formulation  [33].  This  is  sufficiently  accurate  and  very  fast  to  compute. 

There  are  three  functional  components  to  our  block  matching  based  optical  flow  esti¬ 
mation  method.  The  first  and  most  compute-intensive  component  is  the  block  matching 
itself.  Since  this  technique  is  based  on  matching,  it  operates  on  a  temporal  pair  of  im¬ 
ages.  For  each  pixel  in  the  current  image,  a  7  x  7  template  is  marked  around  the  pixel. 
This  template  is  compared  across  a  search  space  in  the  previous  image  using  the  absolute 
error  criterion.  The  shift  corresponding  to  the  minimum  total  absolute  error  across  the 
template  forms  the  integral  part  of  the  computed  flow.  We  have  not  used  any  accel¬ 
erated  search  technique  here  although  a  multiresolution  search  can  speed  up  this  step 
substantially. 

The  second  component  of  our  flow  computation  technique  is  the  estimation  of  subpixel 
shift.  For  this,  we  use  an  ad-hoc  rule.  We  compute  the  total  absolute  error  error  at  ±1 
pixel  from  the  best  shift  along  both  axes.  We  fit  a  second-degree  polynomial  to  the 
error  profiles  independently  in  each  direction.  We  pick  the  minimum  of  this  polynomial, 
which  can  be  shown  to  be  in  [—0.5, 0.5),  as  the  fractional  part  of  the  computed  flow.  The 
final  functional  component  is  the  determination  of  whether  or  not  there  is  sufficient  2D 
information  at  each  pixel  to  reliably  compute  flow.  For  this,  we  use  two  criteria,  viz.  the 
determinant  and  the  condition  number  of  the  matrix 


M  = 


II  IJy 
Uy  II 


(57) 


where  Ix  and  ly  are  the  image  gradients,  and  the  averages  are  computed  over  the  7x7 
template.  Only  when  the  determinant  and  condition  number  are  respectively  larger  and 
smaller  than  two  preset  parameters  is  the  pixel  flagged  as  one  for  which  flow  can  be 
computed.  In  practice,  the  integral  and  fractional  flow  are  computed  subsequent  to  this 
determination. 

The  subpixel  shift  estimator  is  not  particularly  accurate.  However,  with  no  subpixel 
estimates,  the  computed  depth  map  often  shows  a  sawtooth  pattern.  This  is  somewhat 
mitigated  with  the  above  approach  although  better  techniques  like  phase  correlation  will 
presumably  give  even  better  results.  One  way  of  improving  the  subpixel  estimates  is 
to  blur  the  input  images.  An  analysis  of  the  histogram  of  computed  velocities  shows 
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a  more  uniform  distribution  than  with  no  blur.  Undeniably,  several  possibilities  exist 
for  improving  the  performance  of  this  process.  In  summary,  our  flow  technique  is  crude 
though  very  effective. 

5.2.3  3D  Stabilization 

Stabilization  is  a  differential  process  that  compensates  for  the  “unwanted”  motion  in  an 
image  sequence.  In  typical  situations,  the  term  “unwanted”  denotes  the  motion  in  the 
sequence  resulting  from  the  kinematic  motion  of  the  camera  with  respect  to  an  inertial 
frame  of  reference.  In  these  situations,  the  “unwanted”  component  of  motion  does  not 
carry  any  information  of  relevance  to  the  observer,  and  indeed  strains  its  functioning. 
The  more  common  2D  image  stabilization  techniques  apply  an  interframe  translation, 
similarity,  affine  or  perspective  transformation  to  compensate  for  image  motion.  These 
often  perform  poorly  when  the  scene  is  richly  structured  in  3D. 

When  a  3D  scene  is  being  imaged  by  an  unsteady  camera,  the  resulting  image  motion 
is  a  result  of  the  camera  parallax  motion  (translation)  as  well  as  camera  rotation.  Since 
the  parallax  shift  cannot  be  compensated  for  and  is  often  deliberate  or  “wanted” ,  it  is  the 
rotation  that  must  be  anulled.  Computing  the  3D  rotation  in  an  image  sequence  requires, 
in  effect,  that  the  3D  SFM  problem  be  solved.  The  rotational  component  of  motion  is 
readily  computed  once  the  FOE  is  determined.  For  this  problem,  the  depth  structure  of 
the  scene  is  largely  irrelevant.  This  allows  us  to  use  the  overlapped  basis  technique  [33] 
for  computing  the  flow  field  with  no  detriment.  The  advantages  of  using  the  overlapped 
basis  flow  field  estimator  are  improved  accuracy  and  computational  speed. 

Figs.  8  (a)  and  (b)  show  the  first  and  hundredth  frames  of  the  Martin  Marietta 
sequence.  The  camera  is  mounted  looking  ahead  on  a  vehicle  as  it  traverses  unpaved 
terrain.  There  is  sufficient  texture  in  most  of  the  image,  and  the  interframe  displacements 
are  small,  permitting  differential  optical  flow  computation.  The  FOE  and  rotation  angles 
are  computed  using  our  algorithm.  The  estimated  pitch,  yaw  and  roll  plots  are  shown  in 
Figs.  8  (c),  (d)  and  (e)  respectively.  These  are  in  excellent  visual  compliance  with  the 
results  obtained  by  Yao  [34]. 

Fig.  9  demonstrates  the  effect  of  3D  stabilization.  Fig.  9(a)  shows  the  twentieth  frame 
of  the  sequence.  We  chose  this  frame  as  it  displays  higher  than  average  angular  deviation 
from  the  first  frame.  With  no  stabilization,  the  difference  between  the  twentieth  and  first 
frames  is  shown  in  Fig.  9(b).  The  fully  stabilized  image  (compensated  for  roll,  pitch  and 
yaw)  and  its  difference  from  the  first  frame  are  shown  in  Figs.  9(c)  and  (d)  respectively. 
In  the  difference  image,  areas  near  the  camera  show  larger  deviations  than  those  at  a 
distance.  This  is  the  effect  of  translation  of  the  camera. 

Since  our  algorithm  actually  computes  the  three  rotation  angles  for  each  frame,  we 
can  go  one  step  further  to  perform  “selective  stabilization” .  For  instance,  if  we  wish  to 
compensate  only  for  camera  roll,  we  disregard  the  effects  of  pitch  and  yaw  while  dero- 
tating  the  frames.  Fig.  9(e)  shows  the  twentieth  frame  of  the  Martin  Marietta  sequence, 
stabilized  for  roll  only.  The  difference  from  the  first  frame  is  shown  in  Fig.  9(f).  The 
parallel  horizon  and  mountain  profile  in  this  figure  reveals  the  unstabilized  pitch  and  yaw 
motion.  Extending  this  concept,  one  can  selectively  stabilize  for  certain  frequencies  of 
motion  to  eliminate  handheld  jitter  while  preserving  deliberate  camera  pan,  etc. 
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5.2.4  Independent  Motion  Detection 

Detecting  an  independently  moving  foreground  object  against  a  stationary  background 
is  trivial  if  the  camera  is  fixed.  Frame  differencing  is  often  sufficient  to  accomplish  this 
job.  When  the  camera  is  moving  with  respect  to  the  background,  more  sophisticated 
techniques  must  be  used.  If  the  background  can  be  assumed  to  be  approximately  planar, 
2D  stabilization  steadies  the  background.  The  moving  foreground  can  be  located  by 
frame  differencing  the  stabilized  image  sequence. 

In  a  true  3D  scenario  with  the  camera  undergoing  3D  motion  and  a  richly  structured 
3D  background,  no  global  image  transformation  can  stabilize  all  background  objects. 
Here,  we  use  the  consistency  of  the  computed  depth  map  that  solves  the  3D  SFM  as  a 
cue  to  locate  independently  moving  foreground  objects.  Areas  that  have  a  negative  depth 
or  very  small  positive  depth  are  marked  as  belonging  to  foreground  objects.  In  theory, 
this  is  not  a  sufficient  discriminant.  What  is  actually  computed  is  more  accurately  the 
“time-to-collision”  and  not  the  inverse  depth.  In  theory,  there  may  exist  areas  whose 
time-to-collision  with  the  image  plane  lies  within  valid  limits.  An  alternate  technique  is 
to  compute  the  difference  between  the  observed  optical  flow  and  that  calculated  using 
the  estimated  3D  motion  and  structure,  for  each  pixel  where  flow  is  known.  However,  in 
our  experiments  we  found  the  first  cue  sufficient. 

Fig.  10  shows  the  results  of  our  first  experiment.  Two  consecutive  frames  of  the  se¬ 
quence,  gathered  from  a  forward-translating  vehicle  on  a  highway,  are  shown  in  Figs.  10(a) 
and  (b).  The  computed  flow  between  these  frames  is  shown  in  Fig.  10(c).  The  depth  maps 
generated  from  analysis  by  our  algorithm,  and  after  processing,  are  shown  in  Figs.  10(d) 
and  (e)  respectively.  Here,  the  white  areas  are  those  where  no  depth  estimate  is  avail¬ 
able.  Background  regions  are  marked  in  light  gray.  Receding  and  approaching  areas  are 
indicated  by  dark  gray  and  black  respectively.  The  raw  depth  is  processed  by  a  series  of 
morphological  steps  of  erosion  and  dilation.  Fig.  10(f)  overlays  the  processed  result  on 
the  original  image.  It  can  be  seen  that  the  vehicle  near  the  center  of  the  frame  is  well 
segmented  as  a  reading  object  and  the  vehicle  near  the  edge  is  marked  as  an  approaching 
object. 

Our  next  experiment  demonstrates  a  situation  where  although  the  FOE  estimation 
mechanism  is  ill-suited,  the  result  is  very  accurate.  Figs.  11(3-)  and  (b)  are  two  successive 
frames  of  the  Radius  sequence.  The  camera  is  mounted  looking  sideways  from  a  moving 
vehicle.  The  camera  translates  along  the  image  plane  and  hence  the  FOE  lies  at  infinity 
(or  very  far  away  from  the  image  center).  The  central  assumptions  of  our  algorithm  are 
violated,  rendering  our  technique  inapplicable  in  principle.  The  computed  flow  between 
frames  is  shown  in  Fig.  11(c),  revealing  the  significant  rotation  as  well.  Our  algorithm 
estimates  the  FOE  to  lie  on  the  right-hand  side  at  the  periphery  of  the  search  space,  which 
is  itself  axbitrarily  offset  to  the  right  by  300  pixels.  This  indicates  the  likelihood  that 
the  true  minimum  lies  even  further  beyond.  The  flow  reconstructed  from  the  computed 
depth  map  and  motion  parameters  is  shown  in  Fig.  11(d).  Visually,  this  is  in  excellent 
agreement  with  the  input  flow.  The  processed  depth  cue  is  shown  in  Fig.  11(e)  with  the 
same  color  legend  as  used  in  Fig.  10.  The  pylon  is  marked  as  a  distant  object  and  the 
vehicle  is  segmented  out  well  (Fig.  11(f)). 
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5.2.5  Rangefinding 

Accuracy  in  the  depth  estimates  is  not  very  critical  to  the  process  of  locating  mov¬ 
ing  foreground  objects.  Accuracy  assumes  a  higher  importance  for  depth  estimation  or 
rangefinding.  The  final  applications,  obstacle  detection  and  3D  model  generation,  build 
on  the  rangefinding  process.  Results  of  our  experiments  are  shown  in  Figs.  12  and  13. 
Subfigures  (a)  and  (b)  are  two  consecutive  frames  of  the  sequence,  (c)  is  the  estimated 
optical  flow  and  (d)  is  the  computed  depth  map.  White  areas  indicate  no  flow  and 
therefore  no  depth  estimate.  Darker  regions  are  farther  from  the  camera. 

Fig.  12(e)  is  a  plot  of  the  depth  as  a  function  of  image  ordinate.  Correcting  for 
projection,  Fig.  12(f)  plots  depth  vs.  coordinate  along  the  horizontal  axis.  All  rows  of 
the  image  are  collapsed  in  these  plots.  Likewise,  Fig.  13(e)  is  a  plot  of  depth  vs.  image 
abscissa  and  (f)  corrects  for  projection.  The  cylindrical  profile  of  the  sponge  is  seen  as 
the  arc  formed  by  the  cluster  of  plots  in  Fig.  12(f).  Similarly,  the  ground  plane  shows  up 
as  the  linear  cluster  of  points  in  the  lower  half  of  Fig.  13(f) 

Closer  examination  of  Figs.  12  and  13  shows  certain  periodicities  in  plots  (e)  and  (f). 
This  is  a  result  of  imperfect  subpixel  flow  estimation.  Our  observation  has  been  that 
despite  the  subpixel  correction  explained  in  Section  5.2.2,  a  histogram  of  velocities  shows 
a  strong  preference  for  integer  shifts.  This  causes  a  “staircase”  effect  in  the  flow  estimate 
which  is  accentuated  in  the  computed  depth  map.  But,  the  cluster  of  row-wise  plots  of 
the  depth  in  Fig.  12  (column-wise  for  Fig.  13)  smooths  out  this  artifact. 

5.2.6  Obstacle  Detection 

Building  on  the  rangefinding  mechanism,  we  fit  a  ground  plane  to  the  computed  inverse 
depth  map.  A  plane  in  3D  shows  up  as  a  planar  function  relating  the  inverse  depth  to 
the  image  coordinate.  Let  the  ground  plane  be  given  by  AX  -f  BY  -t-  CZ  =  1.  In  the 
image  coordinate  system, 

^/f +  B/I  +  C/  =  4 

Ax  +  By  +  Cf  =  /| 

ax  +  by  +  c  =  h{x,y)  (58) 

which  is  a  planar  function  for  the  (scaled)  inverse  depth  h{x,y).  We  fit  a  plane  to  the 
valid  values  of  the  computed  inverse  depth  and  look  for  significant  deviations  from  this 
plane.  While  fitting  the  ground  plane,  we  consider  only  the  lower  two- thirds  of  the  image 
area,  assuming,  as  a  rule  of  thumb,  that  the  top  portion  of  the  image  looks  above  the 
horizon.  Some  morphological  operations  are  used  to  clean  up  the  detected  regions  of 
interest. 

Figs.  14  and  15  show  the  results  of  two  experiments  on  obstacle  detection,  (a)  and 
(b)  are  consecutive  frames  for  which  the  flow  field  is  shown  in  (c).  (d)  is  the  computed 
inverse  depth  map,  where  white  regions  are  areas  where  no  flow,  and  therefore  no  depth 
estimate,  is  available.  Darker  areas  are  closer  to  the  camera,  (e)  is  a  contour  plot  of  the 
magnitude  of  the  deviation  of  the  computed  inverse  depth  from  the  planar  fit.  The  final 
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results  showing  detected  obstacles  superimposed  on  the  original  image  are  shown  in  (f). 
False  alarms,  where  present,  are  small  and  the  segmentation  is  very  good. 

5.2.7  3D  Model  Building 

In  this  final  experiment  involving  real  data,  we  perform  an  exploratory  investigation 
of  the  ultimate  application  of  3D  SFM,  viz.  the  process  of  building  3D  models  from 
image  sequences.  Here,  the  processing  chain  does  not  terminate  upon  computation  of 
the  depth  map,  or  upon  locating  outliers  from  the  ground  plane.  Rather,  a  significant 
portion  of  the  effort  is  directed  towards  digesting  the  computed  depth  map  values  into  a 
meaningful  scene  model.  Even  the  relatively  simple  modeling  technique  used  by  us  in  this 
experiment  is  highly  compute-intensive.  Here,  more  than  in  the  previous  examples,  the 
overall  accuracy  of  the  process  hinges  on  the  pre-  and  post-processing  stages.  “Accuracy” 
is  used  here  as  a  subjective  figure  of  merit. 

Our  3D  modeling  paradigm  is  built  around  the  Virtual  Reality  Modeling  Language 
(VRML).  VRML  offers  a  comprehensive  vocabulary,  ubiquity,  and  the  accompanying 
visualization  tools  that  allow  us  to  concentrate  on  building  rather  than  rendering  the 
model.  We  build  our  3D  models  with  only  planar  faces.  Each  planar  face  is  bounded  by 
a  polygon  which  is  not  necessarily  convex.  Thus,  each  face  is  described  by  its  bounding 
polygonal  vertices  in  3D,  and  a  superimposed  texture  map.  Building  the  3D  scene  there¬ 
fore  involves  breaking  up  the  2D  image  into  a  set  of  polygonal  regions  whose  internal 
pixels  lie  approximately  on  a  plane,  followed  by  computing  the  orientation  and  location 
of  each  planar  region.  It  can  be  naively  claimed  that  the  latter  step  can  be  solved  using 
(58),  so  what  remains  is  to  accurately  segment  the  given  scene  into  polygonal  regions. 
However,  there  are  a  few  hidden  complications  that  provide  daunting  challenges  at  all 
stages  of  processing.  These  steps  are  described  below  in  detail. 

•  Flow  Computation  and  FOE  Estimation:  As  in  our  previous  experiments,  we  use 
two-frame  full-search  block  matching  to  determine  the  optical  flow.  As  before, 
subpixel  flow  estimates  axe  critical  to  the  overall  accuracy.  This  is  followed  by  esti¬ 
mating  the  focus  of  expansion  and  inverse  depth  map.  The  output  is  a  potentially 
sparse  set  of  inverse  depths  over  the  image  area. 

•  Image  Segmentation:  In  parallel  to  the  previous  step,  we  segment  the  image  into  ar¬ 
eas  we  think  fit  well  to  planes  in  3D.  One  possible  method  is  to  manually  demarcate 
these  segments.  Choosing  to  perform  this  task  automatically,  we  have  developed  a 
simple  system  to  segment  the  image  into  areas  of  almost  uniform  intensity.  Here, 
we  make  a  critical  assumption  that  adjacent  pixels  of  similar  gray  level  belong  to 
the  same  physical  plane.  This  holds  reasonably  well  in  real  imagery  since  intensity 
differences  often  exist  between  foreground  and  backgrormd  objects.  The  relative 
depth  difference  within  each  object  is  nearly  zero  compared  to  the  absolute  depth 
from  the  camera.  This  approach  leads  us  to  a  paradox.  Segmented  regions  axe 
largely  smooth  at  their  interiors  and  have  large  derivatives  at  their  boundaries. 
Often,  a  boundary  is  fragmented  into  numerous  tiny  regions  of  no  practical  value 
in  3D  model  building.  But,  it  is  at  these  high-derivative  pixels  that  good  flow  esti¬ 
mates  are  available.  Thus,  useful  and  reliable  flow  estimates  are  available  mostly  at 
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and  outside  the  periphery  of  segmented  regions,  while  most  of  the  interior  provides 
scanty  information. 

In  order  to  minimize  the  wastage  of  useful  flow  information  in  fragmented  peripheral 
regions,  it  is  necessary  to  draw  crisp  boundaries.  Strong  edges  must  be  reinforced, 
and  weak  edges  suppressed,  as  a  precursor  to  region  growing.  We  use  the  Grad¬ 
uated  Non-Convexity  (GNC)  algorithm  [35]  to  perform  discontinuity- preserving 
image  smoothing.  The  image  intensity  data  is  made  to  fit  to  a  membrane  which  is 
allowed  to  break  under  certain  stresses.  The  stiffness  of  the  membrane  governs  the 
smoothing,  and  its  yield  point  governs  the  ability  to  preserve  discontinuities.  The 
GNC  algorithm  is  iterative,  starting  with  a  convex  cost  function  until  convergence 
at  the  desired  error  cost  function. 

The  GNC  step  is  followed  by  region  growing.  After  segmenting  the  image  into 
near  equal  intensity  regions,  the  boundaries  of  these  regions  axe  vectorized  to  form 
polygons.  Segmentation  can  be  improved  by  using  more  sophisticated  techniques 
like  active  contours,  or  using  more  informative  data  like  color  images. 

•  Plane  Estimation:  Using  the  reasoning  of  (58),  we  can  develop  a  linear  system 
of  equations  relating  the  image  plane  positions  and  corresponding  scaled  inverse 
depths  for  points  within  each  region  where  the  depth  is  known.  However,  this  is 
an  incomplete  and  logically  flawed  solution.  This  is  because  the  computed  plane 
must  have  positive  depth  throughout  the  interior  of  its  corresponding  region.  A 
standard  linear  system  of  equations  does  not  guarantee  this.  To  better  illustrate 
this  point,  consider  a  one-dimensional  simplification. 

Let  there  be  ten  equally  spaced  data  points  {d*-,  i  =  0, 1, . . . ,  9},  of  which  only  the 
first  two  are  known.  Let  do  =  1  and  di  =  0.8.  With  no  constraints,  the  best 
line  fitting  these  data  points  does  so  exactly,  and  extrapolates  di^i  =  6, 7, 8, 9  to  be 
negative.  Since  this  is  unacceptable,  the  non-negativity  constraint  must  be  imposed 
on  points  where  the  data  is  not  known.  In  the  ID  case,  it  is  sufficient  to  impose  this 
constraint  at  the  two  endpoints  of  the  data  vector.  When  the  data  to  be  fit  has  a 
domain  in  3?^  and  the  model  is  planar,  it  is  sufficient  to  require  that  non-negativity 
be  satisfied  at  the  boundary  of  the  region. 

Let  be  the  set  of  x  and  y  coordinates  and  corresponding  scaled  inverse 

depths  at  points  with  known  flow,  for  a  particular  image  region.  Also,  let  {xj,  yi)  be 
the  set  of  periphery  points  for  the  same  region.  We  have  the  following  constrained 
minimization  for  solving  for  the  plane  parameters  (a,  6,  c): 

m\ii^{axi-\-hyi  +  c-dif  s.t.  ax,  +  -h c  >  0.  (59) 

i 

This  is  a  quadratic  programming  problem  with  linear  constraints.  We  include  the 
constraints  by  forming  a  composite  cost  function 

*^(^)  =  Yl  +  c  -  d,)^  -I-  A  ^  p  {axi  +  byi  +  c) 

i  i 

f  x^  X  <  0  ,  ^ 

=  I  0  x>0  (60) 
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with  a  penalty  A  which  is  gradually  increased  from  zero.  Minimizing  J(A)  is  the 
most  computationally  expensive  step  of  the  process.  Faster  solutions  [37]  are  com¬ 
mercially  available  as  software  packages. 

•  Conversion  to  VRML:  At  this  stage,  we  have  a  list  of  planar  polygons  in  3D.  The 
3D  coordinates  of  their  vertices  are  known  and  lie  ahead  of  the  camera.  Two 
issues  remain  in  converting  this  list  to  a  usable  format.  First,  the  polygons  can 
potentially  be  non-convex.  Moreover,  they  may  be  multiply  connected  {i.e.  they 
may  have  “holes”  in  them).  In  either  case,  the  polygons  are  broken  up  recursively 
into  triangular  faces  until  only  a  singly  connected  convex  polygon  remains.  Together 
with  the  triangles,  this  final  polygon  forms  the  3D  model  of  its  parent  region. 

The  second  issue  is  one  of  mapping  a  texture  onto  each  face.  Since  the  normal 
view  of  the  scene  is  given,  the  projection  of  the  texture  on  each  face  is  known.  In 
order  to  determine  the  texture  map  used  to  overlay  the  face,  the  known  projection 
must  be  rewarped  to  the  plane  of  the  face.  Although  this  is  not  mathematically 
complicated,  tricky  data  storage  issues  are  involved.  We  have  chosen  not  to  carry 
out  this  rectification  on  the  texture  data,  at  the  cost  of  enduring  visual  distortions 
in  our  experimental  results.  In  our  opinion,  this  reprojection  is  better  suited  to  be 
merged  with  the  rendering  mechanism. 

•  Experiment:  Using  the  procedure  outlined  above,  we  performed  an  experiment  on 
the  image  pair  data  shown  in  Figs.  16(a)  and  (b).  The  computed  optical  flow  and 
inverse  depth  map  are  shown  in  Figs.  16(c)  and  (d)  respectively.  In  the  latter,  white 
areas  are  those  where  no  valid  depth  estimate  exists.  Fig.  16(e)  shows  the  segmented 
regions.  Certain  areas  like  the  road  are  oversegmented,  i.e.  several  adjacent  regions 
correspond  to  the  same  plane.  On  the  other  hand,  choosing  parameters  to  produce 
fewer  regions  leads  to  undersegmentation,  e.g.  the  vaji  being  merged  with  the 
sky,  which  is  more  undesirable.  We  solve  the  quadratic  programming  step  using  a 
conjugate  gradient  algorithm.  The  plane  parameters  (a,  6,  c)  axe  computed  for  each 
region  and  the  inverse  depth  map  reconstructed  as  shown  in  Fig.  16(f). 

Fig.  17  shows  six  rendered  views  of  the  3D  model  we  generated.  The  first  image. 
Fig.  17(a),  is  the  rendering  from  the  normal  viewpoint.  “Cracks”  in  the  image  are 
due  to  the  polygonal  boundary  approximation  of  regions.  Also,  tiny  fragmented 
regions  are  rejected  by  the  algorithm  and  and  are  not  rendered.  Note  that  the 
realism  of  our  generated  model  is  better  seen  using  a  VRML  browser  than  through 
printed  images.  As  mentioned  earlier,  the  texture  maps  are  not  rectified,  giving 
rise  to  systematic  distortions  in  the  images. 

The  first  synthetic  viewpoint  is  from  above  the  normal.  The  ground  drops  out 
while  the  van  and  other  distant  objects  remain  almost  at  the  same  level  as  before 
(Fig.  17(b)).  Next,  we  generate  the  views  to  the  left  and  right  of  the  normal,  shown 
in  Figs.  17(c)  and  (d)  respectively.  The  road  surface  warps  accordingly.  Finally,  we 
move  the  viewpoint  ahead  of  the  normal  in  Figs.  17(e)  and  (f).  The  ground  diverges 
outward  and  the  rest  of  the  image  changes  as  expected.  This  exploratory  study 

^Available  at  http://www.cfar.umd.edu/  shridhar/Demos/index.html 
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shows  the  feasibility  of  using  our  algorithm  to  generate  3D  models  from  video,  and 
also  analyzes  the  post-processing  steps,  which  are  indeed  more  complicated  and 
crucial  than  depth  estimation. 

6  Conclusions 

The  3D  structure  from  motion  problem  is  very  interesting  both  from  the  theoretical 
and  the  application  points  of  view.  Although  in  theory  3D  motion  and  depth  can  be 
recovered  simultaneously  from  a  flow  field,  the  solution  has  proven  to  be  difficult.  3D  SFM 
provides  valuable  cues  for  depth  estimation,  3D  stabilization,  robotic  navigation,  obstacle 
avoidance,  time  to  collision  and  virtual  reality  model  generation.  But  a  theoretically 
sound,  robust  and  computationally  tractable  solution  has  eluded  researchers.  In  this 
paper,  we  have  presented  what  we  believe  is  a  viable  solution  to  the  problem. 

Our  motivation  in  this  work  has  been  to  come  up  with  an  elegant  solution  that  fully 
exploits  the  linear  dependence  of  the  optical  flow  on  the  focus  of  expansion  and  on  the 
scaled  inverse  depth  map.  The  fundamental  result  in  this  paper  is  a  theoretical  proof  of 
our  claim  that  a  partial  search  for  the  focus  of  expansion  is  computationally  equivalent 
to  performing  a  finite  number  ®  of  2D  FFTs.  Our  experimental  results  on  a  wide  variety 
of  synthetic  data  representing  noise  in  the  flow  field,  sparsity  of  the  computed  flow, 
uncertainty  in  the  focal  length  estimate,  type  of  underlying  depth  map  and  field  of  view 
demonstrates  the  correctness,  robustness  and  performance  envelope  of  our  algorithm. 
We  show,  through  a  variety  of  experiments,  the  utility  of  our  algorithm  for  performing 
3D  stabilization,  rangefinding,  independent  motion  detection,  obstacle  detection,  and  3D 
virtual  reality  model  building.  Our  experiments  validate  the  theory  behind  our  approach, 
and  our  claims. 

3D  SFM  is  an  old  problem  and  has  received  much  attention  over  the  years.  In  par¬ 
allel,  the  computational  power  available  to  the  image  analyst  has  steadily  increased  over 
time.  At  this  juncture,  it  is  viable  to  use  fast  search-based  techniques  to  solve  computer 
vision  problems.  An  advantage  of  our  algorithm  is  its  ready  portability  to  digital  signal 
processors  (DSPs).  The  powerful  FFT  support  of  DSPs  makes  it  conceivable  to  build  an 
extremely  fast  (possibly  parallel)  DSP-based  computing  engine  to  implement  our  algo¬ 
rithm.  Having  a  robust  solution  to  3D  SFM  will  allow  future  researchers  to  concentrate 
their  efforts  on  higher  level  vision  steps,  such  as  primitive  modeling  and  logical  inference, 
in  building  computer  vision  systems 
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Appendix 


A  Proofs  of  the  Fast  Computability  Theorems 
A.l  Proof  of  Theorem  1 


9(^0,  io)  = 


N^-l 

(Xt  bi  [^0?  Jo] 

t=0 

iV^-l 

^  ^  bi  S  [i/iVJ  jiiriodN  [^0?  Jo] 

t=0 

N-1  N-1 

[*o,  io] 

z=0  j=0 
N-1  N-1 

E  E  Cus 

t=0  j=0 


N-1  N-1 


E  E  c«s 

i=0  j=0 


(t-|-zo )  modAT,  (j + jo )  modiV 


(61) 

(62) 


where  the  N  x  N  matrix  C  is  comprised  of  component-wise  products,  C,j  =  aiN+jhiN+j- 

(62)  is  a  2-D  spatial  correlation  of  the  “image”  C  with  a  sliding  “template”  S.  Each 
element  of  the  resultant  matrix  after  correlation  is  the  value  of  q{io,jo)  for  the  appropriate 
shift.  This  operation  can  be  performed  in  0(iV^  log  N)  operations  using  Fourier  domain 
techniques. 

Let  ^  be  the  discrete  Fourier  transform  operator.  Form  the  N  x  N  matrix  Q  s.t. 
Qi,j  =  q{hj)-  We  have 

:FQ{k,  1)  =  J^c(-k, 1)  (63) 

where  the  discrete  Fourier  transforms  J-q,  J-q  ^rid  ^re  defined  on  the  matrices  C, 
S  and  Q  respectively.  We  can  use  the  Fast  Fourier  Transform  to  compute  the  terms  in 

(63) ,  when  N  =  2’’,  in  0{N^  log  N)  steps.  The  product  in  Fourier  space  takes  0{N‘^)  op¬ 
erations,  requiring  an  overall  complexity  of  log  TV)  for  the  computation  of  q{io,jo)- 


A. 2  Proof  of  Theorem  2 


Writing  out  a  and  b  in  terms  of  their  components 


and  using 


a  =  (  co.o  00,1  oi,o  oi,i 

. .  .  QN-lfi  ajV-1,1  ) 

(64) 

&  =  (  ^0,0  ^0,1  bifi  61,1 

•  •  •  b^-ifi  ) 

<^[^5^0]  =  BlockDiagjs 

Li7iVJ,»modiv[*0,  jo]} 

(65) 
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we  get 


9(*o,io)  =  a'S[io,jo]b 


~  XI  {®j,0^i,oS[i/Arj,iniod7Voo[*05io]  +  [j/;VJ,jmodJVoi [*0, io] 

izzO 

+  <^i,lbi,oS[i/Nl,imodNiQ[io,  jo]  +  Oj,i6i,iS[,yjVJ,jmodiVii[*05io]} 

=  ®»-0^*.O^b7^J.imodiVoo[*0,io]  ai,0^>i,lSLi/yVJ,imodNoi[*0,io] 

j=0  2  .  ®*>l^«iOSLt/iVJ,imodiVjQ[*0?io]  Jo] 


i=0  2  L 

=  EEEfn*:  .ioo®7ioo[*0’7o]  1 

2  t=0  j=o  .  ^»>Jio^*tiio[*05io] 

Y  [  ^l~0  1:^=0  Cij,,SijJio,jo]  Elo^  EU  Ci,j,,SijJio,jo]  J 

E2  denotes  the  sum  of  the  four  entries  of  the  2x2  matrix  summand.  C  €  3J2JVx2Ar  jg 
formed  by  the  componentwise  product  of  a  and  b: 

=  i  C  C yl'"'  )  -  l‘  =  iN  +  j  (67) 

Each  of  the  entries  of  the  summand  matrix  in  (66)  is  of  the  form  (61)  and  is  FC  according 
to  Theorem  1.  The  four  components  of  the  signals  for  each  pair  (io,io)  can  be  summed 
in  0{N^)  operations.  The  overall  complexity  of  evaluating  q{iojjo)  over  the  permissible 
values  of  io  and  jo  is  0{N^  log  N).  Hence,  ^(io,  yo)  is  FC. 

A. 3  Proof  of  Theorem  3 


9(^0, io)  —  0‘'S{io-,jo)b 

N^-l 

^  ■/  ^ibi^\i/N\+io,iiaodN+jo 
i=0 

N-l  N-1 

~  E  E  *^iN+jbiN+j^i+io,j+jo 

iz=0  j=0 


Define  C  e  3?2Arx2iv 


^  /  (^iN+jbiN+j  ijj  €  {0, 1 . . .  iV  —  !)■ 

1  0  otherwise. 


Noting  that 


^*+Jo,i+jo  ®(»+io)mod(2iV),(j+jo)inod(2A'')j  ^^lij^Oiio  G  0,  1  .  .  .  A  —  1  ('^f^) 


we  get 


2Ar-l  2iV-l 

^(^O’io)  —  ^2  E  ^*.i®(*+»o)mod(27V),(i+jo)mod(2JV)?  Viojio  6  {0,  1  .  .  .  TV  —  1}  (71) 

2=0  j=0 
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which  is  of  the  form  (62),  with  2N  replacing  N.  Thus  ?(io,io)  can  be  computed  in 
0{{2Ny  log(2iV))  =  0{N^  log  N)  operations,  proving  the  theorem.  Note  that  the  Fourier 
technique  will  calculate  q{io,jo)  over  a  larger  domain,  viz.  {0, 1 . . .  2iV  —  1}^,  of  which 
the  useful  values  are  lie  within  the  “North-West”  quarter  of  the  matrix. 


A.4  Proof  of  Theorem  4 

Writing  out  a  and  b  in  terms  of  their  components  as  in  (64)  we  get 
q{ioJo)  =  a'S{io,jo)b 

—  bifi  [S[iy;VJ+io,imocUV+jo]oO  "I"  Ci,0  ^i,l  [S[eyiVJ+io,imodiV+jo]oi 

t=0 

bjfi  [S|^jy7VJ^-j(,jjiiiodAf+io]lO  d”  [i/ATJ+iojimodAT+jolll} 

_  C,jv+j,0^iiV+i,o[Si+jo  J+jo]oo  <j^jiV+j,0^ziV+j,l[®*+*0)i+io]oi  (72) 

2  ^  .  °*W'+i.i^*A^+j.o[St+io,j+io]io  aiJV+j,i&iiV+i,i[St+io.i+io]ii  J 

As  in  A. 3,  we  define  C  G  by 

r  Co,0  •  •  .  Co,2JV-l  1 


C  = 


C2iV-l,0  C2Ar-l,2iV-l  J 

O'iN+jfi  biN+j,0  CliN+jfibiN+j,l 
<iiN+j,l  biN+j,0  0,iN+j,lbiN+j,l 


ij  €  {0, l...iV-  1} 


Using  the  reasoning  in  (70),  we  get 


qiiojo)  = 

2  . 


9oo  <?oi 
9io  9ii 


otherwise. 


Vfo,io€{0,l...iV-l} 


2N-1  2N-1 

900  =  ^0/  ^®>ioo[®(®+®o)niod(2JV),(j+jo)mod(2JV)]oO 

t=0  j=0 

2N-1  2N-1 

qoi  =  E  E  01  (*+*o )  mod(2iV) ,  ( j+jo )  mod(2iV)]  01 

i=0  j=0 

2N-1  2N-1 

9l0  =  C'iJlo[®(*+*o)niod(2N),(j+jo)mod(2iV)]lO 

j=0  i=0 

2N-1  2N-1 

qii  =  E  E  ^*5ill[^(*+*o)mod(2iV),(i+io)niod(2iV')]ll  (74) 

i=0  j=0 

As  before,  J22  denotes  the  sum  over  the  four  components  of  the  summand  matrix.  Each 
of  the  terms  qki  is  of  the  form  (71)  and  is  FC  by  Theorem  3.  Since  the  components 
themselves  can  be  summed  in  0{N^)  operations,  the  overall  complexity  in  computing 
qiioijo)  over  the  domain  {0, 1 ...  A  —  1}^  is  0{N‘^  \ogN),  proving  the  theorem. 
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B  FastCompute  Specification 


Objective: 

Input: 

Output: 

Internal  Data: 
Data  Structures: 


Steps: 


To  compute  ?(io,io)  =  a'M(«o,io)6,  Vio,io  G  {0, 1 . . .  -  1}. 

Vectors  a,  6  €  . 

Matrix  Q  e  where  Qij  =  q{ij). 

M,  given  by  (42). 

Matrices  Moo,Moi,Mii  G  initialized  to 

=  ayO  -  ;v  +  i)2 
=  -N+  1)0'  -  iv  + 1) 


(i-iv  +  i)2  +  (y-jv  +  i)2 

'ii,P  G  ^2iVx2Ar^  initialized  to  zero. 
Complex  valued  matrices  Moo,  Moi,  Mii,  Coo,  Coi,  Cio,  Ai,  Q  G 

1.  Initialize  data  structures  as  above. 


Mooij 

Afoitj 

Miiij 

CXiJ 

Matrices  Coo,  Coi,  Cio, 


2.  Compute  the  discrete  Fourier  transforms  Moo,Moi  and  Mu  of 
Moo,Moi  and  Mu  respectively. 

3.  Write  out  a  and  b  as  in  (64). 

4.  Redefine  Coo,  Coi,  Cio,  Cu  in  the  upper  quarter  as 


Coojj 

CoiiJ 

Cioij 

Ciijj 


^iN+j,0  biN +j,0 

aiN+j,obiN+j,i 

0'iN+j,lbiN+jfl 

(>’iN+j,lbiN+j,l 


(76) 


5.  Compute  the  discrete  Fourier  transforms  Coo,  Coi,  Cio  and  Cu  of 
Coo,  Coi,  Cio  and  Cu  respectively. 

6.  Set  Q  to 

[0]*,/  =  [Coo]fc,/[li^oo]fc,/  +  [Coi]fc;[Moi]A,/ 

+  [Cio]fc,;[Moi]A:,i  +  [Cii]^_,[Mii]fc,/  (77) 

since  for  a  real  signal  x(n),  its  Fourier  transform  x{k)  displays 
conjugate  symmetry,  i.e.  x{k)  =  x*{-k).  Note  that  this  requires 
complex  arithmetic.  This  summation  performs,  in  effect,  the  sum 
over  components  (E2)  in  A.4  since  J^-'^{x) =  J^-^{x  +  y). 
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^  A 

7.  Compute  the  inverse  discrete  Fourier  transform  Q  of  Q. 

8.  Populate  the  output  matrix  Q  copying  from  Q: 

Qi,j  =  Qi,v  i,j  E  {0,1...  N  -1} 
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Figure  1;  Set  A:  (a)  shows  the  underlying  fractal  inverse  depth  map,  (b)  the  rotational 
flow,  (c)  the  translational  flow  and  (d)  the  total  optical  flow  field. 


Figure  2:  Set  B:  (a)  shows  the  underlying  fractal  inverse  depth  map,  (b)  the  rotational 
flow,  (c)  the  translational  flow  and  (d)  the  total  optical  flow  field. 
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Figure  4:  Worst-case  performance  with  a  noisy,  sparse  flow  field:  (a)  Set  A  at  80% 
density,  (b)  Set  B  at  20%  density,  (c)  and  (d)  error  surface  contours  with  true  FOE  (-(-) 
and  estimated  FOE  (x)  corresponding  to  (a)  and  (b)  respectively.  Zero-mean  Gaussian 
noise  with  cr  =  1.0  has  been  added  to  (a)  and  (b). 
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Figure  5:  Performance  vs.  Depth  Map  Structure:  Two  extreme  depth  maps  are  shown 
here,  (a)  shows  the  planar  depth  map  and  (b)  is  a  fractal  depth  map  with  exponent  1.1. 
(c)  and  (d)  show  flow  fields  generated  with  parameter  sets  A  and  B  and  depth  maps  (a) 
and  (b)  respectively.  Corresponding  error  contours,  with  true  and  estimated  FOEs,  are 
plotted  in  (e)  and  (f). 


(«)  (f)  (g)  (h) 


Figure  7:  Performance  vs.  Estimated  Focal  Length:  (a)-(d)  error  contours  for  Set  A, 
(e)-(h)  for  Set  B.  The  true  focal  length  is  400  pixels,  (a)-(d),  and  (e)-(h)  use  incorrest 
estimates  of  200,  283,  566  and  800  pixels  respectively. 
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Figure  8:  3D  Stabilization:  (a)  first  and  (b)  hundredth  frame  of  Martin  Marietta 
quence,  (c)  pitch,  (d)  yaw  and  (e)  roll  as  a  function  of  frame  number 


Figure  9:  3D  Stabilization:  (a)  twentieth  frame  of  Martin  Marietta  sequence,  (b)  differ¬ 
ence  between  first  and  twentieth  frame  with  no  stabilization,  (c)  fully  stabilized  twen¬ 
tieth  frame,  (d)  stabilized  difference,  (e)  stabilized  only  for  roll,  (f)  difference  between 
roll-stabilized  frame  and  the  first  frame  of  the  sequence 
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Figure  10:  Independent  motion  detection:  (a)  and  (b)  two  consecutive  frames  of  the 
sequence,  (c)  computed  flow,  (d)  computed  depth  map,  (e)  raw  and  (f)  cleaned  regions 
showing  independent  motion 
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(e) 


Figure  11:  Independent  motion  detection:  (a)  and  (b)  two  consecutive  frames  of  the 
Radius  sequence,  (c)  computed  flow,  (d)  reconstructed  flow  from  computed  3D  motion 
parameters  and  depth  map,  (e)  segmented  areas,  (f)  cleaned  regions  showing  independent 
motion 


51 


Figure  12:  Rangefinding:  (a)  and  (b)  two  consecutive  frames  of  the  sequence,  (c)  com¬ 
puted  flow,  (d)  computed  depth  map,  (e)  plot  of  depth  as  a  function  of  image  ordinate 
over  the  entire  height  of  the  image,  (f)  plot  of  depth  along  the  horizontal  axis;  the  cylin¬ 
drical  profile  of  the  sponge  is  evident 


(e)  (f) 


Figure  13:  Rangefinding:  (a)  and  (b)  two  consecutive  frames  of  the  sequence,  (c)  com¬ 
puted  flow,  (d)  computed  depth  map,  (e)  plot  of  depth  as  a  function  of  image  abscissa 
over  the  entire  width  of  the  image,  (f)  plot  of  depth  along  the  vertical  axis  showing  the 
ground  plane  as  the  cluster  of  points  forming  a  line 
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Figure  14:  Obstacle  detection:  (a)  and  (b)  two  consecutive  frames  of  the  sequence, 
(c)  computed  optical  flow,  (d)  inverse  depth  map,  (e)  deviation  from  ground  plane,  (f) 
located  obstacles 


(e)  (f) 


Figure  16:  3D  virtual  reality  model  building:  (a)  and  (b)  two  consecutive  frames  of  the  J7 
sequence,  (c)  computed  optical  flow,  (d)  inverse  depth  map,  (e)  automatically  segmented 
regions,  (f)  reconstructed  inverse  depth  map 
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(e) 


(f) 


Figure  17:  3D  VR  model  building:  (a)  reconstructed  image  from  normal  viewpoint  using 
3D  model  (distortion  in  the  images  is  caused  by  the  texture  map  not  being  orthorectified), 
(b-f)  image  generated  from  other  viewpoints:  (b)  above,  (c)  to  the  left,  (d)  to  the  right, 
(e)  ahead  and  (f)  further  ahead  of  the  normal 
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